0


用交叉检验离群点检测法检测数值链接数据中的错误

Detecting Errors in Numerical Linked Data using Cross-Checked Outlier Detection
课程网址: http://videolectures.net/iswc2014_fleischhacker_detecting_errors/  
主讲教师: Daniel Fleischhacker
开课单位: 曼海姆大学
开课时间: 2014-12-19
课程语种: 英语
中文简介:

用于识别数据中错误值的异常值检测通常应用于单个数据集,以搜索它们以查找意外行为的值。在这项工作中,我们提出了一种方法,该方法结合了两个独立的异常值检测运行的结果,以获得更可靠的结果,并防止自然异常值引起的问题,这些异常值是数据集中的异常值,但仍然是正确的。 Linked Data 特别适合这种想法的应用,因为它提供了大量富含层次信息的数据,并且还包含实例之间的显式链接。在第一步中,我们将异常值检测方法应用于从单个存储库中提取的属性值,使用一种将数据拆分为相关子集的新方法。对于第二步,我们利用实例的 owl:sameAs 链接来获取额外的属性值并对这些值执行第二次异常值检测。这样做允许我们确认或拒绝错误值的评估。在 Dbpedia 和 NELL 数据集上的实验证明了我们方法的可行性。

课程简介: Outlier detection used for identifying wrong values in data is typically applied to single datasets to search them for values of unexpected behavior. In this work, we instead propose an approach which combines the outcomes of two independent outlier detection runs to get a more reliable result and to also prevent problems arising from natural outliers which are exceptional values in the dataset but nevertheless correct. Linked Data is especially suited for the application of such an idea, since it provides large amounts of data enriched with hierarchical information and also contains explicit links between instances. In a first step, we apply outlier detection methods to the property values extracted from a single repository, using a novel approach for splitting the data into relevant subsets. For the second step, we exploit owl:sameAs links for the instances to get additional property values and perform a second outlier detection on these values. Doing so allows us to confirm or reject the assessment of a wrong value. Experiments on the Dbpedia and NELL datasets demonstrate the feasibility of our approach.
关 键 词: 异常值检测; 数据识别; 错误值拒绝
课程来源: 视频讲座网
数据采集: 2021-06-27:zyk
最后编审: 2021-06-27:zyk
阅读次数: 46