0


从内在关联的日志文件点击估计一个网络搜索引擎模型

A Model to Estimate Intrinsic Document Relevance from the Clickthrough Logs of a Web Search Engine
课程网址: http://videolectures.net/wsdm2010_dupret_amtei/  
主讲教师: Georges Dupret
开课单位: 雅虎公司
开课时间: 2010-10-18
课程语种: 英语
中文简介:
我们提出了一个新的模型来解释Web搜索引擎的点击日志。该模型基于对用户行为的明确假设。特别是,我们通过观察用户检查文档后的行为得出文档相关性的结论,而不是基于用户是否单击文档URL。这导致了一个基于内在相关性的模型,而不是感知相关性。我们使用该模型预测文档相关性,然后将其作为“学习排名”机器学习算法的特征。通过对算法进行训练得到的排序函数与不进行新特性训练得到的排序函数进行比较,结果令人惊讶。特别值得注意的是,我们使用的基线是领先商业搜索引擎的高度优化排名功能。更深入的分析表明,新特性对于放弃率高或每个会话查询的平均数量大的非导航查询和查询特别有用。这很重要,因为这些类型的查询被认为是最难解决的。
课程简介: We propose a new model to interpret the click through logs of a web search engine. This model is based on explicit assumptions on the user behavior. In particular, we draw conclusions on a document relevance by observing the user behavior after he examined the document and not based on whether a user clicks or not a document url. This results in a model based on intrinsic relevance, as opposed to perceived relevance. We use the model to predict document relevance and then use this as feature for a “Learning to Rank” machine learning algorithm. Comparing the ranking functions obtained by training the algorithm with and without the new feature we observe surprisingly good results. This is particularly notable given that the baseline we use is the heavily optimized ranking function of a leading commercial search engine. A deeper analysis shows that the new feature is particularly helpful for non navigational queries and queries with a large abandonment rate or a large average number of queries per session. This is important because these types of query is considered to be the most difficult to solve.
关 键 词: 网络搜索引擎; 机器学习算法; 非导航查询
课程来源: 视频讲座网
最后编审: 2020-01-13:chenxin
阅读次数: 33