
Extending functional dependency to detect abnormal data in rdf graphs
课程网址: http://videolectures.net/iswc2011_yu_rdfgraphs/  
主讲教师: Yang Yu
开课单位: 利哈伊大学
开课时间: 2011-11-25
课程语种: 英语
课程简介: Data quality issues arise in the Semantic Web because data is created by diverse people and/or automated tools. In particular, erroneous triples may occur due to factual errors in the original data source, the acquisition tools employed, misuse of ontologies, or errors in ontology alignment. We propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. Inspired by functional dependency, which has shown promise in database data quality research, we introduce value-clustered graph functional dependency to detect abnormal data in RDF graphs. To better deal with Semantic Web data, this extends the concept of functional dependency on several aspects. First, there is the issue of scale, since we must consider the whole data schema instead of being restricted to one database relation. Second, it deals with multi-valued properties without explicit value correlations as specified as tuples in databases. Third, it uses clustering to consider classes of values. Focusing on these characteristics, we propose a number of heuristics and algorithms to efficiently discover the extended dependencies and use them to detect abnormal data. Experiments have shown that the system is efficient on multiple data sets and also detects many quality problems in real world data.
关 键 词: 数据质量; 语义Web; 值聚类图函数
课程来源: 视频讲座网
最后编审: 2019-05-05:lxf
阅读次数: 45