0


互联网上科学数据的质量、信任和效用:走向联合模型

Quality, Trust, and Utility of Scientific Data on the Web: Towards a Joint Model
课程网址: http://videolectures.net/acmwebsci2011_gamble_joint/  
主讲教师: Matthew Gamble
开课单位: 曼彻斯特大学
开课时间: 2011-07-19
课程语种: 英语
中文简介:
在科学上,质量是最重要的。随着科学家越来越多地通过网络分享和发现科学数据,支持科学家评估这些数据质量的需求也越来越大。然而,质量是一个含糊的、重载的术语。为了支持科学用户发现有用的数据,我们系统地研究了\ \质量的本质。利用科学数据集的三个普遍属性:(1)数据质量通常是客观定义的;(2)其产生的起源和世系具有明确的作用;(3)“;fitness-for-use"是效用的定义,而不是质量或信任的定义,即数据的质量和可靠性以及产生该数据的实体告知其效用。我们的研究分两个阶段进行。首先,我们回顾了现有的信息质量维度,并详细介绍了面向评估的分类方法。我们根据评估所需的实体,引入质量、信任和效用的定义;生产者,提供者,消费者,过程,工件和质量标准。接下来,我们详细介绍了一种新的、实验性的评估方法,它通过构建由源图提供信息的决策网络来建模质量、信任和效用维度之间的因果关系。为了奠定和激励我们的讨论,我们利用欧洲生物信息学研究所的基因本体论注释数据库。我们首先展示了我们的方法,并举例说明了如何使用一种新的目标质量度量——基因本体论注释质量评分——对来自基因本体论注释数据库的结果进行排序。
课程简介: In science, quality is paramount. As scientists increasingly look to the Web to share and discover scienti fic data, there is a growing need to support the scientist in assessing the quality of that data. However, quality is an ambiguous and overloaded term. In order to support the scienti fic user in discovering useful data we have systematically examined the nature of \quality" by exploiting three, prevalent properties of scientifi c data sets: (1) that data quality is commonly defi ned objectively; (2) the provenance and lineage in its production has a well understood role; and (3)"fitness-for-use" is a de finition of utility rather than quality or trust, where the quality and trust-worthiness of the data and the entities that produced that data inform its utility. Our study is presented in two stages. First we review existing information quality dimensions and detail an assessment-oriented classiffi cation. We introduce de finitions for quality, trust and utility in terms of the entities required in their assessment; producer, provider, consumer, process, artifact and quality standard. Next we detail a novel and experimental approach to assessment by modelling the causal relationships between quality, trust, and utility dimensions through the construction of decision networks informed by provenance graphs. To ground and motivate our discussion throughout we draw on the European Bioinformatics Institute's Gene Ontology Annotations database. We present an initial demonstration of our approach with an example for ranking results from the Gene Ontology Annotation database using an emerging objective quality measure, the Gene Ontology Annotation Quality score.
关 键 词: 互联网; 科学数据; 联合模型
课程来源: 视频讲座网
最后编审: 2019-10-31:lxf
阅读次数: 48