0


挖掘网络促进快速和准确的近似匹配

Mining the Web to Facilitate Fast and Accurate Approximate Match
课程网址: http://videolectures.net/www09_xin_mtw/  
主讲教师: Venkatesh Ganti; Surajit Chaudhuri; Dong Xin
开课单位: 微软公司
开课时间: 2009-05-20
课程语种: 英语
中文简介:
依赖于识别实体的任务最近在文献中得到了极大的关注。许多此类任务假定引用实体表的存在。在本文中,我们考虑确定候选字符串是否与引用实体近似匹配的问题。这个问题对于从引用实体表中提取命名实体(如产品或位置)或跨异构源匹配实体条目非常重要。以前的方法依赖于基于字符串的相似性,这种相似性只比较候选字符串及其匹配的实体。在本文中,我们观察到在多个文档中考虑这样的证据可以显著提高匹配的准确性。我们开发了有效的技术,利用网络搜索引擎,以促进在我们提出的相似函数的上下文近似匹配。在一个广泛的实验评估,我们证明了我们的技术的准确性和效率。
课程简介: Tasks relying on recognizing entities have recently received significant attention in the literature. Many such tasks assume the existence of reference entity tables. In this paper, we consider the problem of determining whether a candidate string approximately matches with a reference entity. This problem is important for extracting named entities such as products or locations from a reference entity table, or matching entity entries across heterogenous sources. Prior approaches have relied on string-based similarity which only compare a candidate string and an entity it matches with. In this paper, we observe that considering such evidence across multiple documents significantly improves the accuracy of matching. We develop efficient techniques which exploit web search engines to facilitate approximate matching in the context of our proposed similarity functions. In an extensive experimental evaluation, we demonstrate the accuracy and efficiency of our techniques.
关 键 词: 网络; 字符串; 近似匹配; 搜索引擎
课程来源: 视频讲座网
最后编审: 2020-06-08:吴雨秋(课程编辑志愿者)
阅读次数: 35