DBpedia利用维基百科跨语言信息的自动扩展Automatic expansion of DBpedia exploiting Wikipedia cross-language information |
|
课程网址: | http://videolectures.net/eswc2013_palmero_aprosio_expansion/ |
主讲教师: | Harald Sack; Alessio Palmero Aprosio |
开课单位: | 德国波茨坦大学 |
开课时间: | 2013-07-08 |
课程语种: | 英语 |
中文简介: | dbpedia是一个以RDF三元组表示维基百科内容的项目。它在语义Web中起着核心作用,因为链接到语义Web的资源数量越来越多。如今,只有170万个维基百科页面在DBpedia本体论中被深度分类,尽管英语维基百科包含近400万个页面,显示出一个明显的覆盖问题。在其他语言(如法语和西班牙语)中,这种覆盖率甚至更低。本文的目的是定义一种方法来增加不同语言的dbpedia覆盖率。我们必须解决的主要问题涉及到DBpedia本体中涉及的大量类,以及某些语言类的覆盖率不足。为了解决这些问题,我们首先通过跨语言链接连接相应的维基百科页面来扩展不同语言的类的数量。然后,我们使用这个扩展集作为训练数据来训练一个监督分类器。我们使用一个手工注释的测试集对我们的系统进行了评估,证明了我们的方法可以为dbpedia添加超过100万个新实体,具有高精度(90%)和召回(50%)。生成的资源通过SPARQL端点和可下载的包提供。 |
课程简介: | DBpedia is a project aiming to represent Wikipedia content in RDF triples. It plays a central role in the Semantic Web, due to the large and growing number of resources linked to it. Nowadays, only 1.7M Wikipedia pages are deeply classified in the DBpedia ontology, although the English Wikipedia contains almost 4M pages, showing a clear problem of coverage. In other languages (like French and Spanish) this coverage is even lower. The objective of this paper is to define a methodology to increase the coverage of DBpedia in different languages. The major problems that we have to solve concern the high number of classes involved in the DBpedia ontology and the lack of coverage for some classes in certain languages. In order to deal with these problems, we first extend the population of the classes for the different languages by connecting the corresponding Wikipedia pages through cross-language links. Then, we train a supervised classifier using this extended set as training data. We evaluated our system using a manually annotated test set, demonstrating that our approach can add more than 1M new entities to DBpedia with high precision (90%) and recall (50%). The resulting resource is available through a SPARQL endpoint and a downloadable package. |
关 键 词: | 维基百科; 监督分类; 跨语言链接 |
课程来源: | 视频讲座网 |
最后编审: | 2019-12-04:lxf |
阅读次数: | 70 |