使用与域无关的候选选择方法自动生成数据链接][Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach]_MOOC(慕课)境外开放课程

   首页 → 信息处理技术
   首页 → 计算机应用
   首页 → 计算机工程

使用与域无关的候选选择方法自动生成数据链接 Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach


课程网址:	http://videolectures.net/iswc2011_song_linkages/
主讲教师:	Dezhao Song
开课单位:	利哈伊大学
开课时间:	2011-11-25
课程语种:	英语
中文简介:	关联数据的一个挑战是可扩展地建立高质量的owl：sameAs在不同数据源中的实例（例如，人，地理位置，出版物等）之间的链接。此实体共指问题的传统方法无法扩展，因为它们会详尽地比较每对实例。在本文中，我们提出了一种候选选择算法，用于修剪实体共指的搜索空间。我们通过计算使用域独立无监督学习选择的区分文字值的字节级别相似性来选择候选实例对。我们在所选谓词的文字值上索引实例以有效地查找类似的实例。我们在两个RDF和三个结构化数据集上评估我们的方法。我们表明，传统指标并不总能准确反映候选人选择的相对好处，并提出了额外的指标。我们表明，我们的算法经常优于替代方案，并且能够在一个小时内在一个Sun工作站上处理100万个实例。此外，在RDF数据集上，我们通过应用我们的技术显示整个实体共同参照过程可以很好地扩展。令人惊讶的是，这种高召回率，低精度过滤机制经常会导致整个系统中的F分数更高。
课程简介:	One challenge for Linked Data is scalably establishing high quality owl:sameAs links between instances (e.g., people, geographical locations, publications, etc.) in different data sources. Traditional approaches to this entity coreference problem do not scale because they exhaustively compare every pair of instances. In this paper, we propose a candidate selection algorithm for pruning the search space for entity coreference. We select candidate instance pairs by computing a character-level similarity on discriminating literal values that are chosen using domain-independent unsupervised learning.We index the instances on the chosen predicates’ literal values to efficiently look up similar instances. We evaluate our approach on two RDF and three structured datasets. We show that the traditional metrics don’t always accurately reflect the relative benefits of candidate selection, and propose additional metrics. We show that our algorithm frequently outperforms alternatives and is able to process 1 million instances in under one hour on a single Sun Workstation. Furthermore, on the RDF datasets, we show that the entire entity coreference process scales well by applying our technique. Surprisingly, this high recall, low precision filtering mechanism frequently leads to higher F-scores in the overall system.
关键词:	关联数据; 候选选择算法; 无监督学习
课程来源:	视频讲座网
最后编审:	2019-05-05：lxf
阅读次数:	162

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2026 © 浙大宁波理工学院图书馆