0


重温全球排序指标的有效的文献检索

Revisiting Globally Sorted Indexes for Efficient Document Retrieval
课程网址: http://videolectures.net/wsdm2010_yan_rgsi/  
主讲教师: Hao Yan
开课单位: 纽约大学
开课时间: 2010-10-21
课程语种: 英语
中文简介:
在信息检索和网络检索领域,对高效的文献检索进行了大量的研究。提高检索效率的一个重要技术是提前终止,它通过避免扫描整个反向列表来加速查询处理。大多数早期终止技术首先通过按照术语相关信息(例如术语频率或术语红外评分)或术语独立信息(例如文档的静态等级)的顺序对倒排列表进行排序来构建新的倒排索引,然后对生成的索引应用适当的检索策略。虽然仅基于静态秩的方法对于早期终止是无效的,但是使用基于术语无关信息的方法仍然有许多优点。本文提出了一种基于静态秩外独立信息的倒排索引组织新技术,并研究了新的索引检索策略。我们对新技术进行了详细的实验评估,并与现有的方法进行了比较。我们在TRECGov和Gov2数据集上的结果表明,我们的技术可以显著提高查询效率。
课程简介: There has been a large amount of research on efficient document retrieval in both IR and web search areas. One important technique to improve retrieval efficiency is early termination, which speeds up query processing by avoiding scanning the entire inverted lists. Most early termination techniques first build new inverted indexes by sorting the inverted lists in the order of either the term-dependent information, e.g., term frequencies or term IR scores, or the term-independent information, e.g., static rank of the document; and then apply appropriate retrieval strategies on the resulting indexes. Although the methods based only on the static rank have been shown to be ineffective for the early termination, there are still many advantages of using the methods based on term-independent information. In this paper, we propose new techniques to organize inverted indexes based on the term- independent information beyond static rank and study the new retrieval strategies on the resulting indexes. We perform a detailed experimental evaluation on our new techniques and compare them with the existing approaches. Our results on the TREC GOV and GOV2 data sets show that our techniques can improve query efficiency significantly.
关 键 词: 计算机科学; 信息检索; 文献
课程来源: 视频讲座网
最后编审: 2020-06-06:zyk
阅读次数: 30