0


扩展功能依赖性以检测rdf图中的异常数据

Extending functional dependency to detect abnormal data in rdf graphs
课程网址: http://videolectures.net/iswc2011_yu_rdfgraphs/  
主讲教师: Yang Yu
开课单位: 利哈伊大学
开课时间: 2011-11-25
课程语种: 英语
中文简介:
数据质量问题出现在语义Web中,因为数据是由不同的人和/或自动化工具创建的。特别是,由于原始数据源中的事实错误,所采用的采集工具,本体的误用或本体对准中的错误,可能发生错误的条带。我们提出三元偏离相似三元组的程度可以成为识别错误的重要启发式方法。在功能依赖性的启发下,我们在数据库数据研究中表现出了前景,我们引入了值聚类图函数依赖来检测RDF图中的异常数据。为了更好地处理语义Web数据,这在几个方面扩展了功能依赖的概念。首先,存在规模问题,因为我们必须考虑整个数据模式而不是仅限于一个数据库关系。其次,它处理多值属性而没有像数据库中的元组那样指定的显式值相关性。第三,它使用聚类来考虑值类。着眼于这些特征,我们提出了许多启发式算法和算法来有效地发现扩展的依赖关系并使用它们来检测异常数据。实验表明,该系统对多个数据集有效,并且还能检测现实世界数据中的许多质量问题。
课程简介: Data quality issues arise in the Semantic Web because data is created by diverse people and/or automated tools. In particular, erroneous triples may occur due to factual errors in the original data source, the acquisition tools employed, misuse of ontologies, or errors in ontology alignment. We propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. Inspired by functional dependency, which has shown promise in database data quality research, we introduce value-clustered graph functional dependency to detect abnormal data in RDF graphs. To better deal with Semantic Web data, this extends the concept of functional dependency on several aspects. First, there is the issue of scale, since we must consider the whole data schema instead of being restricted to one database relation. Second, it deals with multi-valued properties without explicit value correlations as specified as tuples in databases. Third, it uses clustering to consider classes of values. Focusing on these characteristics, we propose a number of heuristics and algorithms to efficiently discover the extended dependencies and use them to detect abnormal data. Experiments have shown that the system is efficient on multiple data sets and also detects many quality problems in real world data.
关 键 词: 数据质量; 语义Web; 值聚类图函数
课程来源: 视频讲座网
最后编审: 2019-05-05:lxf
阅读次数: 41