0


基于模板网站的RDF知识库的Web扩展

Web-Scale Extension of RDF Knowledge Bases from Templated Websites
课程网址: http://videolectures.net/iswc2014_usbeck_web_scale_extension/  
主讲教师: Ricardo Usbeck
开课单位: 莱比锡大学
开课时间: 2014-12-19
课程语种: 英语
中文简介:

在Web上只有一小部分信息表示为链接数据。缺乏覆盖的部分原因是迄今为止提取链接数据所遵循的范例。尽管工具很好地支持将结构化数据转换为RDF,但大多数从半结构化数据中提取RDF的方法都依赖于基于临时解决方案的提取方法。在本文中,我们提供了一个整体的开放源代码框架,用于从模板网站中提取RDF。我们讨论了该框架的体系结构及其每个组件的初始实现。特别是,我们提出了一种新颖的包装器诱导技术,该技术不需要任何人为监督即可检测网站的包装器。我们的框架还包括一个一致性层,通过该层可以检查包装程序提取的数据的逻辑一致性。我们在三个不同的数据集上评估REX的初始版本。我们的结果清楚地表明了使用模板化网页扩展链接数据云的潜力。此外,我们的结果表明了我们当前实施的弱点以及如何扩展它们。

课程简介: Only a small fraction of the information on the Web is represented as Linked Data. This lack of coverage is partly due to the paradigms followed so far to extract Linked Data. While converting structured data to RDF is well supported by tools, most approaches to extract RDF from semi-structured data rely on extraction methods based on ad-hoc solutions. In this paper, we present a holistic and open-source framework for the extraction of RDF from templated websites. We discuss the architecture of the framework and the initial implementation of each of its components. In particular, we present a novel wrapper induction technique that does not require any human supervision to detect wrappers for web sites. Our framework also includes a consistency layer with which the data extracted by the wrappers can be checked for logical consistency. We evaluate the initial version of REX on three different datasets. Our results clearly show the potential of using templated Web pages to extend the Linked Data Cloud. Moreover, our results indicate the weaknesses of our current implementations and how they can be extended.
关 键 词: 链接数据云; 模板化网页扩展; 包装器诱导技术; RDF提取
课程来源: 视频讲座网
数据采集: 2021-05-28:zyk
最后编审: 2021-05-28:zyk
阅读次数: 41