0


测试集的可重用性度量

Measuring the Reusability of Test Collections
课程网址: http://videolectures.net/wsdm2010_carterette_mtro/  
主讲教师: Ben Carterette
开课单位: 特拉华大学
开课时间: 2010-10-12
课程语种: 英语
中文简介:
虽然测试集合的构建是一个耗时且昂贵的过程,但是真正的成本是通过对集合进行成百上千次的重复使用而摊销的。其中一些实验可能涉及到检索初始构建阶段未判断的文档的系统,其中一些系统可能是硬的。评估:根据缺失的判断和检索到的判断文档,实验者对评估的信心可能非常低。我们提出了两种量化测试集合可重用性的方法来评估新系统。这些方法提供了简单而高效的测试,以确定现有的一组判断是否对评估新系统有用。使用TREC数据集的经验评估证实了我们提出的可重用性度量的有效性。特别地,我们证明了我们的方法能够可靠地估计表示集合可重用性的置信区间。
课程简介: While test collection construction is a time-consuming and expensive process, the true cost is amortized by reusing the collection over hundreds or thousands of experiments. Some of these experiments may involve systems that retrieve documents not judged during the initial construction phase, and some of these systems may be “hard” to evaluate: depending on which judgments are missing and which judged documents were retrieved, the experimenter’s confidence in an evaluation could potentially be very low. We propose two methods for quantifying the reusability of a test collection for evaluating new systems. The proposed methods provide simple yet highly effective tests for determining whether an existing set of judgments is useful for evaluating a new system. Empirical evaluations using TREC datasets confirm the usefulness of our proposed reusability measures. In particular, we show that our methods can reliably estimate confidence intervals that are indicative of collection reusability.
关 键 词: 测试集; 统计; 置信度
课程来源: 视频讲座网
最后编审: 2019-10-24:lxf
阅读次数: 48