TruePIE:在基于模式的信息提取中发现可靠模式TruePIE: Discovering Reliable Patterns in Pattern‑Based Information Extraction |
|
课程网址: | http://videolectures.net/kdd2018_li_truepie_patterns/ |
主讲教师: | Qi Li |
开课单位: | 伊利诺伊大学香槟分校计算机科学系 |
开课时间: | 2018-11-23 |
课程语种: | 英语 |
中文简介: | 基于模式的方法在信息提取和NLP研究中取得了成功。先前的方法基于文本模式的单个内容(例如,长度、频率)和数百个精心标注的标签的统计数据,将文本模式的质量学习为与特定任务的相关性。然而,由于相关性和正确性之间的巨大差距,良好内容质量的模式可能会产生严重冲突的信息。在(实体、属性、值)元组提取中,评估信息的正确性至关重要。在这项工作中,我们提出了一种新的方法,称为TruePIE,它可以找到可靠的模式,不仅可以提取相关信息,而且可以提取正确的信息。TruePIE采用自训练框架,重复训练预测提取过程,逐步发现更多更可靠的模式。为了更好地表示文本模式,制定了模式嵌入,以便具有相似语义的模式彼此紧密嵌入。嵌入共同考虑了局部模式信息和提取的分布信息。为了克服对模式可靠性缺乏监督的挑战,TruePIE可以通过应用arity约束来区分高度可靠的模式(即,正模式)和高度不可靠的模式,基于多个种子模式自动生成高质量的训练模式。在一个巨大的新闻数据集(超过25GB)上的实验表明,所提出的TruePIE在三项任务中的每一项上都显著优于基线方法:可靠元组提取、可靠模式提取和否定模式提取。 |
课程简介: | Pattern-based methods have been successful in information extraction and NLP research. Previous approaches learn the quality of a textual pattern as relatedness to a certain task based on statistics of its individual content (e.g., length, frequency) and hundreds of carefully-annotated labels. However, patterns of good content-quality may generate heavily conflicting information due to the big gap between relatedness and correctness. Evaluating the correctness of information is critical in (entity, attribute, value)-tuple extraction. In this work, we propose a novel method, called TruePIE, that finds reliable patterns which can extract not only related but also correct information. TruePIE adopts the self-training framework and repeats the training-predicting-extracting process to gradually discover more and more reliable patterns. To better represent the textual patterns, pattern embeddings are formulated so that patterns with similar semantic meanings are embedded closely to each other. The embeddings jointly consider the local pattern information and the distributional information of the extractions. To conquer the challenge of lacking supervision on patterns’ reliability, TruePIE can automatically generate high quality training patterns based on a couple of seed patterns by applying the arity-constraints to distinguish highly reliable patterns (i.e., positive patterns) and highly unreliable patterns (i.e., negative patterns). Experiments on a huge news dataset (over 25GB) demonstrate that the proposed TruePIE significantly outperforms baseline methods on each of the three tasks: reliable tuple extraction, reliable pattern extraction, and negative pattern extraction. |
关 键 词: | 信息提取和NLP研究; 精心标注的标签的统计数据; TruePIE; 可靠元组提取 |
课程来源: | 视频讲座网 |
数据采集: | 2023-01-27:cyh |
最后编审: | 2023-01-27:cyh |
阅读次数: | 45 |