
Link prediction for annotation graph datasets using graph summarization
课程网址: http://videolectures.net/iswc2011_thor_graph/  
主讲教师: Andreas Thor
开课单位: 马里兰大学
开课时间: 2011-11-25
课程语种: 英语
注释图数据集是科学知识的自然表示。它们在生命科学中很常见,其中基因蛋白用来自本体的受控词汇表术语(CV术语)注释。 W3C链接开放数据(LOD)计划和语义Web技术在使这些数据集广泛可用方面发挥着主导作用。科学家可以挖掘这些数据集来发现注释模式。虽然在语义Web的背景下探索了跨数据集的本体对齐和集成,但目前还没有在注释图数据集中挖掘这种模式的方法。本文提出了一种新的链接预测方法;在发现更复杂的模式时,这是一项初步任务。我们的预测基于图摘要(GS)和密集子图(DSG)的互补方法。 GS可以利用和总结在本体和注释模式中捕获的知识。 DSG使用本体结构,特别是CV术语之间的距离,过滤图表,以及查找有希望的子图。我们基于多种启发式方法开发了一种评分函数来对预测进行排名。我们对拟南芥(Arabidopsis thaliana)基因进行了广泛的评估。
课程简介: Annotation graph datasets are a natural representation of scientifi c knowledge. They are common in the life sciences where genes or proteins are annotated with controlled vocabulary terms (CV terms) from ontologies. The W3C Linking Open Data (LOD) initiative and semantic Web technologies are playing a leading role in making such datasets widely available. Scientists can mine these datasets to discover patterns of annotation. While ontology alignment and integration across datasets has been explored in the context of the semantic Web, there is no current approach to mine such patterns in annotation graph datasets. In this paper, we propose a novel approach for link prediction; it is a preliminary task when discovering more complex patterns. Our prediction is based on a complementary methodology of graph summarization (GS) and dense subgraphs (DSG). GS can exploit and summarize knowledge captured within the ontologies and in the annotation patterns. DSG uses the ontology structure, in particular the distance between CV terms, to filter the graph, and to find promising subgraphs. We develop a scoring function based on multiple heuristics to rank the predictions. We perform an extensive evaluation on Arabidopsis thaliana genes.
关 键 词: 注释图数据集; 基因蛋白; 语义Web
课程来源: 视频讲座网
最后编审: 2019-05-05:lxf
阅读次数: 54