从文档集合中构建聚焦分类Constructing a Focused Taxonomy from a Document Collection |
|
课程网址: | http://videolectures.net/eswc2013_divoli_taxonomy/ |
主讲教师: | Anna Divoli; Harald Sack |
开课单位: | 德国波茨坦大学 |
开课时间: | 2013-07-08 |
课程语种: | 英语 |
中文简介: | 我们描述了一种从文档集合构造自定义分类的新方法。它包括识别文本中的相关概念和实体;将它们链接到维基百科、DBpedia、Freebase等知识源,以及从相关领域提供的任何分类法;消除冲突概念映射的歧义;以及选择最能将它们分层分组的语义关系。RDF模型支持这些步骤的互操作性,还提供了一种灵活的方法,包括现有的NLP工具和进一步的知识源。从2000篇新闻文章中,我们构建了一个具有10000个概念和12700个关系的自定义分类法,其结构类似于手动创建的对应项。由15位人类法官进行的评估表明,概念和关系的精确度分别为89%和90%;对于同一领域的人工生成分类法,召回率为75%。 |
课程简介: | We describe a new method for constructing custom taxonomies from document collections. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, and any supplied taxonomies from related domains; disambiguating conflicting concept mappings; and selecting semantic relations that best group them hierarchically. An RDF model supports interoperability of these steps, and also provides a flexible way of including existing NLP tools and further knowledge sources. From 2000 news articles we construct a custom taxonomy with 10,000 concepts and 12,700 relations, similar in structure to manually created counterparts. Evaluation by 15 human judges shows the precision to be 89% and 90% for concepts and relations respectively; recall was 75% with respect to a manually generated taxonomy for the same domain. |
关 键 词: | 文档集合; 自定义分类; 语义关系; 自然语言处理工具 |
课程来源: | 视频讲座网 |
最后编审: | 2019-12-04:lxf |
阅读次数: | 57 |