0


利用加权依赖模型的学习概念的重要性

Learning Concept Importance Using a Weighted Dependence Model
课程网址: http://videolectures.net/wsdm2010_bendersky_lci/  
主讲教师: Michael Bendersky
开课单位: 谷歌公司
开课时间: 2010-02-22
课程语种: 英语
中文简介:
通过术语依赖对查询概念建模已经被证明对检索性能有显著的积极影响,特别是对于诸如web搜索之类的任务,在这些任务中,高级别的相关性尤其重要。然而,以前的大多数工作都将所有概念视为同等重要的,这种假设通常不成立,尤其是对于更长的、更复杂的查询。在本文中,我们证明了一个最有效的现有术语依赖模型可以自然地通过给概念赋权来扩展。我们证明,加权依赖模型可以使用现有的学习到排名的技术进行训练,即使训练查询的数量相对较少。我们的研究比较了内源性(基于收集)和外源性(基于外部来源)特征在确定概念重要性方面的有效性。为了测试加权依赖模型,我们对公开可用的TREC语料库和专有的web语料库进行了实验。我们的实验结果表明,我们的模型一致且显著优于标准词袋模型和非加权项依赖模型,并且结合内生和外生特征通常会得到最佳的检索效果。
课程简介: Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as web search, where relevance at high ranks is particularly critical. Most previous work, however, treats all concepts as equally important, an assumption that often does not hold, especially for longer, more complex queries. In this paper, we show that one of the most effective existing term dependence models can be naturally extended by assigning weights to concepts. We demonstrate that the weighted dependence model can be trained using existing learning-to-rank techniques, even with a relatively small number of training queries. Our study compares the effectiveness of both endogenous (collection- based) and exogenous (based on external sources) features for determining concept importance. To test the weighted dependence model, we perform experiments on both publicly available TREC corpora and a proprietary web corpus. Our experimental results indicate that our model consistently and significantly outperforms both the standard bag-of-words model and the unweighted term dependence model, and that combining endogenous and exogenous features generally results in the best retrieval effectiveness.
关 键 词: 计算机科学; Web搜索; 语料库
课程来源: 视频讲座网
最后编审: 2021-01-31:nkq
阅读次数: 43