0


用于信息抽取的耦合半监督学习

Coupled Semi-Supervised Learning for Information Extraction
课程网址: http://videolectures.net/wsdm2010_carlson_cssl/  
主讲教师: Andrew Carlson
开课单位: 卡内基梅隆大学
开课时间: 2010-03-18
课程语种: 英语
中文简介:
我们考虑到半监督学习的问题,从网页中提取类别(例如,学术领域、运动员)和关系(例如,playssport(运动员、运动)),从每个类别或关系的几个贴有标签的培训示例开始,再加上数亿个未贴有标签的Web文档。由于学习任务缺乏约束,仅使用少数标记示例的半监督培训通常不可靠。本文试图通过进一步约束学习任务,结合不同类别和关系的许多提取器的半监督训练,来达到更高的精度。我们描述了分类和关系抽取器训练可以耦合的几种方法,并给出了实验结果,结果表明显著提高了准确性。
课程简介: We consider the problem of semi-supervised learning to extract categories (e.g., academic fields, athletes) and relations (e.g., PlaysSport (athlete, sport)) from web pages, starting with a handful of labeled training examples of each category or relation, plus hundreds of millions of unlabeled web documents. Semi-supervised training using only a few labeled examples is typically unreliable because the learning task is underconstrained. This paper pursues the thesis that much greater accuracy can be achieved by further constraining the learning task, by coupling the semi-supervised training of many extractors for different categories and relations. We characterize several ways in which the training of category and relation extractors can be coupled, and present experimental results demonstrating significantly improved accuracy as a result.
关 键 词: 半监督学习; 标记样本; 学术领域
课程来源: 视频讲座网
最后编审: 2020-09-24:dingaq
阅读次数: 83