0


从博客圈中提取元语句

Extracting Meta Statements from the Blogosphere
课程网址: http://videolectures.net/icwsm2011_mesquita_blogosphere/  
主讲教师: Filipe Mesquita
开课单位: 阿尔伯塔大学
开课时间: 2011-08-18
课程语种: 英语
中文简介:
最近提出了用于组织和探索大型在线文本语料库中的内容作为信息网络的信息提取系统。在这样的网络中,节点是命名实体(例如,人,组织),而边缘对应于指示这些实体之间的关系的语句。迄今为止,此类系统提取相当原始的网络,仅捕获由直接语句表示的那些关系。在许多应用程序中,提取更细微的关系也很有用,这些关系通常在文本中表示为元语句。例如,这些可以提供声明的背景(例如,“Google于2006年10月收购了YouTube”),或者对声明的反响(例如,“美国谴责俄罗斯入侵格鲁吉亚”)。在这项工作中,我们报告了一个系统,用于提取直接语句和元语句中表达的关系。我们提出了一种基于条件随机场的方法,该方法探索语法特征以无缝地提取两种语句。我们遵循开放信息提取范例,其中训练分类器以识别任何类型的关系而不是特定关系。最后,我们的结果显示了在现有技术信息提取系统方面的实质性改进,无论是在准确性方面,还是在召回方面。
课程简介: Information extraction systems have been recently proposed for organizing and exploring content in large online text corpora as information networks. In such networks, the nodes are named entities (e.g., people, organizations) while the edges correspond to statements indicating relations among such entities. To date, such systems extract rather primitive networks, capturing only those relations which are expressed by direct statements. In many applications, it is useful to also extract more subtle relations which are often expressed as meta statements in the text. These can, for instance provide the context for a statement (e.g., “Google acquired YouTube on October 2006”), or repercussion about a statement (e.g., “The US condemned Russia’s invasion of Georgia”). In this work, we report on a system for extracting relations expressed in both direct statements as well as in meta statements. We propose a method based on Conditional Random Fields that explores syntactic features to extract both kinds of statements seamlessly. We follow the Open Information Extraction paradigm, where a classifier is trained to recognize any type of relation instead of specific ones. Finally, our results show substantial improvements over a state-of-the-art information extraction system, both in terms of accuracy and, especially, recall.
关 键 词: 在线文本语料库; 条件随机场; 信息提取
课程来源: 视频讲座网
最后编审: 2019-04-27:lxf
阅读次数: 26