0


高精度,高召回率的获取可扩展知识

Scalable Knowledge Harvesting with High Precision and High Recall
课程网址: http://videolectures.net/wsdm2011_nakashole_skh/  
主讲教师: Ndapandula Nakashole
开课单位: 马克斯普朗克研究所
开课时间: 2011-08-09
课程语种: 英语
中文简介:
从Web资源中获取关系事实因自动构建大型知识库而受到极大关注。最先进的方法将基于模式的事实候选人收集与基于约束的推理结合起来。然而,它们仍然面临着在精确度、召回率和可伸缩性之间进行权衡的重大挑战。规模化好的技术容易受到降低精度的噪声模式的影响,而采用深度推理的技术不能处理web规模化的数据。 本文提出了一个可伸缩的系统,称为普洛斯佩拉,用于获取高质量的知识。我们为更丰富的模式提出了一种新的ngram-itemsets概念,并对模式的质量和事实候选的有效性使用了基于maxsat的约束推理。我们计算模式出现统计有两个好处:它们可以修剪假设空间,并为推理者推导出信息权重的子句。本文展示了如何将这些构建块合并到可伸缩的体系结构中,该体系结构可以在基于hadoop的分布式平台上并行化所有阶段。我们对ClueWeb09语料库的实验与最近的ReadTheWeb实验进行了比较。在召回率方面,我们在运行时间较短的情况下,以同样的精度显著优于这些先前的结果。
课程简介: Harvesting relational facts from Web sources has received great attention for automatically constructing large knowledge bases. Stateof-the-art approaches combine pattern-based gathering of fact candidates with constraint-based reasoning. However, they still face major challenges regarding the trade-offs between precision, recall, and scalability. Techniques that scale well are susceptible to noisy patterns that degrade precision, while techniques that employ deep reasoning for high precision cannot cope with Web-scale data. This paper presents a scalable system, called PROSPERA, for high-quality knowledge harvesting. We propose a new notion of ngram-itemsets for richer patterns, and use MaxSat-based constraint reasoning on both the quality of patterns and the validity of fact candidates.We compute pattern-occurrence statistics for two benefits: they serve to prune the hypotheses space and to derive informative weights of clauses for the reasoner. The paper shows how to incorporate these building blocks into a scalable architecture that can parallelize all phases on a Hadoop-based distributed platform. Our experiments with the ClueWeb09 corpus include comparisons to the recent ReadTheWeb experiment. We substantially outperform these prior results in terms of recall, with the same precision, while having low run-times.
关 键 词: 知识库; 资源; 计算机科学
课程来源: 视频讲座网
最后编审: 2020-09-24:dingaq
阅读次数: 93