首页数学
0


V][排名从对和三胞胎:信息质量,评价方法和查询的复杂性[五]

Ranking From Pairs and Triplets: Information Quality, Evaluation Methods and Query Complexity
课程网址: http://videolectures.net/wsdm2011_radinsky_rfp/  
主讲教师: Kira Radinsky
开课单位: 以色列理工学院
开课时间: 2011-08-09
课程语种: 英语
中文简介:
从人类评估者那里获得判断是搜索引擎设计中的重要组成部分。评价。今天,评估者从评估者获取(培训阶段)与使用答案进行检索评估(评估阶段)之间存在差异。这种差异源于两个阶段中信息表示之间的不一致。在培训期间,要求评估者在查询的上下文中提供单个结果的相关性分数,而评估在搜索结果的有序列表上执行,结果’相对位置(与其他结果相比)被考虑在内。作为学习使用相关性判断对个体搜索结果进行排名的实践的替代方案,最近越来越多的焦点被转移到从关于搜索结果集的组合问题的答案中学习的理论和实践。也就是说,在训练期间,要求用户对小组(通常是对)进行排名。我们首先从统计学上比较人类评价者对个体结果相关性问题与成对结果问题的比较。我们根据经验表明,两种类型的响应都不能从另一种中推断出来,并且当结果一起显示时创建的附加上下文会改变评价者’评估过程。由于成对判断与排名直接相关,因此我们得出结论,它们对于此目的更为准确。为了测量统计偏好,我们超越了对,表明三胞胎不包含比成对更多的信息。为了学习排名,这两个结果建立了成对比较的良好稳定性。我们进一步分析了不同的场景,其中不同质量的结果被添加为“诱饵”。关注成对比较的论文中反复出现的担忧来源是一组结果中的二次对数。我们选择从付费评估者那里征集哪些偏好?我们可以证明可以消除二次成本吗?我们在我们需要选择哪一对和多少对的问题上开始严格的统计学习理论研究,并证明结果表明只需要O(n polylog n)对以获得几乎完美的排序。这些对是自适应选择的,因此我们的方案提出了主动学习算法。
课程简介: Obtaining judgments from human raters is a vital part in the design of search engines’ evaluation. Today, there exists a discrepancy between judgment acquisition from raters (training phase) and use of the responses for retrieval evaluation (evaluation phase). This discrepancy is derived from the inconsistency between the representation of the information in both phases. During training, raters are requested to provide a relevance score for an individual result in the context of a query, whereas the evaluation is performed on ordered lists of search results, with the results’ relative position (compared to other results) is taken into account. As an alternative to the practice of learning to rank using relevance judgments for individual search results, more and more focus has recently been diverted to the theory and practice of learning from answers to combinatorial questions about sets of search results. That is, users, during training, are asked to rank small sets (typically pairs). We start by statistically comparing human rater response to questions on relevance of individual results versus questions on pairs of results. We empirically show that neither type of response can be deduced from the other, and that the added context created when results are shown together changes raters’ evaluation process. Since pairwise judgments are directly related to ranking, we conclude they are more accurate for that purpose. We go beyond pairs to show that triplets do not contain significantly much more information than pairs for the purpose of measuring statistical preference. These two results establish good stability properties of pairwise comparisons for the purpose of learning to rank. We further analyze different scenarios, in which results of varying quality are added as “decoy”. A recurring source of worry in papers focusing on pairwise comparison is the quadratic number of pairs in a set of results. Which preferences do we choose to solicit from paid raters? Can we provably eliminate a quadratic cost? We initiate a rigorous statistical learning theoretical study in the question of which and how many pairs we need to choose, and prove a result indicating that only O(n polylog n) pairs are required in order to obtain an almost perfect ordering. These pairs are chosen adaptively, and hence our scheme suggests an active learning algorithm.
关 键 词: 获取; 评分判断; 组成部分
课程来源: 视频讲座网
最后编审: 2020-06-29:wuyq
阅读次数: 58