0


猫头鹰网络的快照

A snapshot of the OWL Web
课程网址: http://videolectures.net/iswc2013_matentzoglu_owl_web/  
主讲教师: Nicolas Matentzoglu
开课单位: 曼彻斯特大学
开课时间: 2013-11-28
课程语种: 英语
中文简介:

OWL本体工程中的工具开发和经验实验需要各种各样的合适本体作为输入,以进行测试和评估,并详细描述真实本体。经验活动通常求助于(某种程度上是任意的)网上可用的手工整理语料库,例如NCBO BioPortal和TONES信息库,或手动选择的一组知名本体。调查的结果和基准测试活动的结果可能甚至偏重于这些数据集。另一方面,从大量本体中进行采样可能会导致更具代表性的结果。当前的大型存储库和Web爬网大多是未经整理的,并且存在重复,较小和(出于许多目的)无趣的本体文件的麻烦,并且包含大量本体版本,变体和构面,因此不适合进行随机采样。在本文中,我们对存在于网络中的本体进行了调查,并描述了使用Web爬网,各种重复数据删除和手动清理等策略创建OWL DL本体语料库的方法,该策略允许对各种本体进行随机抽样。经验应用。

课程简介: Tool development for and empirical experimentation in OWL ontology engineering require a wide variety of suitable ontologies as input for testing and evaluation purposes and detailed characterisations of real ontologies. Empirical activities often resort to (somewhat arbitrarily) hand curated corpora available on the web, such as the NCBO BioPortal and the TONES Repository, or manually selected sets of well-known ontologies. Findings of surveys and results of benchmarking activities may be biased, even heavily, towards these datasets. Sampling from a large corpus of ontologies, on the other hand, may lead to more representative results. Current large scale repositories and web crawls are mostly uncurated and suffer from duplication, small and (for many purposes) uninteresting ontology files, and contain large numbers of ontology versions, variants, and facets, and therefore do not lend themselves to random sampling. In this paper, we survey ontologies as they exist on the web and describe the creation of a corpus of OWL DL ontologies using strategies such as web crawling, various forms of de-duplications and manual cleaning, which allows random sampling of ontologies for a variety of empirical applications.
关 键 词: Web爬网; 语料库整理; 随机抽样调查; OWL
课程来源: 视频讲座网
数据采集: 2021-05-28:zyk
最后编审: 2021-05-28:zyk
阅读次数: 83