0


用计数量词丰富知识库

Enriching Knowledge Bases with Counting Quantifiers
课程网址: http://videolectures.net/iswc2018_mirza_enriching_knowledge_bases...  
主讲教师: Paramita Mirza
开课单位: 马克斯·普朗克信息研究所
开课时间: 2018-10-22
课程语种: 英语
中文简介:
信息提取传统上侧重于提取可识别实体之间的关系,例如。然而,文本通常也包含计数信息,说明一个主题与许多对象有特定关系,而没有提及对象本身,例如,“加利福尼亚被划分为58个县”。这种计数量词可以帮助完成各种任务,如查询解答或知识库管理,但被先前的工作所忽略。本文开发了第一个完整的从文本中提取计数信息的系统,称为CINEX。我们使用来自知识库的事实计数作为训练种子,采用远程监督,并开发新的技术来应对几个挑战:(i)由于知识库的不完整性而导致的非最大训练种子,(ii)文本源中的稀疏和倾斜观察,以及(iii)语言模式的高度多样性。用五种人类评价关系进行的实验表明,CINEX可以达到60%的平均精度来提取计数信息。在一个大规模的实验中,我们通过将CINEX应用于Wikidata中的2474个频繁关系,展示了知识库丰富的潜力。CINEX可以断言110个不同关系中存在250万个事实,这比这些关系中现有的Wikidata事实多28%。
课程简介: Information extraction traditionally focuses on extracting relations between identifiable entities, such as . Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, "California is divided into 58 counties". Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations.
关 键 词: 查询解答或知识库管理; 提取计数信息的系统; 语言模式的高度多样性; 现有的Wikidata事实
课程来源: 视频讲座网
数据采集: 2022-12-30:cyh
最后编审: 2023-05-15:cyh
阅读次数: 34