0


钻研:维基百科文本中的正相关提取

PORE: Positive-Only Relation Extraction from Wikipedia Text
课程网址: http://videolectures.net/iswc07_wang_por/  
主讲教师: Gang Wang
开课单位: 上海交通大学
开课时间: 2008-01-31
课程语种: 汉简
中文简介:
提取语义关系对于语义Web内容的创建非常重要。使用随时可用的结构化内容半自动从维基百科的自由文本中提取关系是非常有益的。采用信息冗余的模式匹配方法不能很好地工作,因为与Web相比,维基百科中的冗余信息不多。多类分类方法是不合理的,因为维基百科中没有关系类型的分类。在本文中,我们提出了PORE(正唯一关系提取),用于从维基百科文本中提取关系。核心算法B POL使用自举,强阴性识别和转导推断来扩展现有技术的仅正学习算法以使用较少的积极训练示例。我们对不同数量的训练数据进行了几次关系的实验。实验结果表明,只有少量的积极训练样例,B POL可以有效地工作,并且它明显优于原始的积极学习方法和多类SVM。此外,尽管PORE应用于维基百科的上下文中,但核心算法B POL是本体群体的一般方法,并且可以适用于其他域。
课程简介: Extracting semantic relations is of great importance for the creation of the Semantic Web content. It is of great benefit to semi-automatically extract relations from the free text of Wikipedia using the structured content readily available in it. Pattern matching methods that employ information redundancy cannot work well since there is not much redundancy information in Wikipedia, compared to the Web. Multi-class classification methods are not reasonable since no classification of relation types is available in Wikipedia. In this paper, we propose PORE (Positive-Only Relation Extraction), for relation extraction from Wikipedia text. The core algorithm B-POL extends a state-of-the-art positive-only learning algorithm using bootstrapping, strong negative identification, and transductive inference to work with fewer positive training examples. We conducted experiments on several relations with different amount of training data. The experimental results show that B-POL can work effectively given only a small amount of positive training examples and it significantly outperforms the original positive learning approaches and a multi-class SVM. Furthermore, although PORE is applied in the context of Wikipedia, the core algorithm B-POL is a general approach for Ontology Population and can be adapted to other domains.
关 键 词: 信息冗余; 维基百科; 提取语义
课程来源: 视频讲座网
最后编审: 2019-04-30:lxf
阅读次数: 73