0


链接开放数据云上显式和隐式模式信息的系统研究

A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud
课程网址: http://videolectures.net/eswc2013_gottron_cloud/  
主讲教师: Olaf Hartig
开课单位: 波茨坦大学
开课时间: 2013-07-08
课程语种: 英语
中文简介:

有关链接开放数据(LOD)云中资源的架构信息可以通过两种方式提供:可以通过将RDF类型附加到资源来明确定义。或者通过资源属性的定义隐式提供。在本文中,我们提出了一种方法和指标来分析信息理论特性以及模式信息的两种表现形式之间的相关性。此外,我们实际上对大型链接数据集进行了这种分析。为此,我们提取了有关为2012年“十亿三重挑战”提供的数据集片段中定义的类型和属性的架构信息。我们进行了深入分析,并计算了各种熵测度以及编码在其中的互信息。两种类型的架构信息。我们的分析提供了对以不同架构特征编码的信息的见解。两个主要发现是,隐式模式信息具有更大的歧视性,而仅基于类型或属性的涉及模式信息的应用程序将仅捕获数据中包含的模式信息的63.5%至88.1%。基于这些观察,我们得出有关LOD的未来模式设计以及潜在应用场景的结论。

课程简介: Schema information about resources in the Linked Open Data (LOD) cloud can be provided in a twofold way: it can be explicitly defined by attaching RDF types to the resources. Or it is provided implicitly via the definition of the resources’ properties. In this paper, we present a method and metrics to analyse the information theoretic properties and the correlation between the two manifestations of schema information. Furthermore, we actually perform such an analysis on large-scale linked data sets. To this end, we have extracted schema information regarding the types and properties defined in the data set segments provided for the Billion Triples Challenge 2012. We have conducted an in depth analysis and have computed various entropy measures as well as the mutual information encoded in the two types of schema information. Our analysis provides insights into the information encoded in the different schema characteristics. Two major findings are that implicit schema information is far more discriminative and that applications involving schema information based on either types or properties alone will only capture between 63.5% and 88.1% of the schema information contained in the data. Based on these observations, we derive conclusions about the design of future schemas for LOD as well as potential application scenarios.
关 键 词: 开放数据; 特征编码
课程来源: 视频讲座网
数据采集: 2021-03-24:zyk
最后编审: 2021-03-24:zyk
阅读次数: 63