0


质量偏见的Web文档的排名

Quality-Biased Ranking of Web Documents
课程网址: http://videolectures.net/wsdm2011_bendersky_qbr/  
主讲教师: Michael Bendersky
开课单位: 谷歌公司
开课时间: 2011-08-09
课程语种: 英语
中文简介:
许多现有的检索方法没有考虑检索到的文档的内容质量,尽管诸如PageRank之类的基于链接的度量通常被用作文档先验的形式。在本文中,我们提出了质量偏向的排名方法,该方法可以促进包含高质量内容的文档,并惩罚低质量的文档。文档内容的质量可以通过其可读性,布局和易于导航以及其他因素来确定。因此,我们不考虑对文档质量使用单一估计,而是考虑多个基于内容的特征,这些特征直接集成到最先进的检索方法中。这些基于内容的功能易于计算,存储和检索,即使对于大型Web集合也是如此。我们使用多个查询集和Web集合来凭经验评估质量偏差检索方法的性能。在每种情况下,我们的方法始终大幅提高基于文本和基于链接的检索方法的检索性能,而不考虑文档内容的质量。
课程简介: Many existing retrieval approaches do not take into account the content quality of the retrieved documents, although link-based measures such as PageRank are commonly used as a form of document prior. In this paper, we present the quality-biased ranking method that promotes documents containing high-quality content, and penalizes low-quality documents. The quality of the document content can be determined by its readability, layout and ease-of-navigation, among other factors. Accordingly, instead of using a single estimate for document quality, we consider multiple content- based features that are directly integrated into a state-of- the-art retrieval method. These content-based features are easy to compute, store and retrieve, even for large web collections. We use several query sets and web collections to empirically evaluate the performance of our quality-biased retrieval method. In each case, our method consistently improves by a large margin the retrieval performance of text- based and link-based retrieval methods that do not take into account the quality of the document content.
关 键 词: 检索方法; 检索性能; 检索质量
课程来源: 视频讲座网
最后编审: 2020-06-01:吴雨秋(课程编辑志愿者)
阅读次数: 97