0


在搜索引擎查询日志中识别基于任务的会话

Identifying Task-based Sessions in Search Engine Query Logs
课程网址: http://videolectures.net/wsdm2011_tolomei_itb/  
主讲教师: Gabriele Tolomei
开课单位: 威尼斯大学
开课时间: 2011-08-09
课程语种: 英语
中文简介:
本文的研究挑战是设计有效的技术来识别基于任务的会话,即由Web搜索引擎的用户为执行给定的任务而发出的可能不连续的查询集。为了评估和比较不同的方法,我们通过手工标记过程构建了一个基本事实,其中给定查询日志的查询被分组到任务中。我们对这一基本事实的分析表明,用户倾向于同时执行多个任务,因为提交的查询中大约75%涉及多任务活动。我们将基于任务的会话发现问题(TSDP)正式定义为最接近人工标注任务的问题,并提出了几种已知聚类算法的变体,以及一种新的高效启发式算法,专门针对求解TSDP进行了优化。这些算法还利用Wiktionary和Wikipedia收集的协作知识来检测查询对,这些查询对从词汇内容的角度来看并不相似,但实际上是语义相关的。所提出的算法已在上述基础上进行了评估,并证明其性能优于最先进的方法,因为它们有效地考虑了用户的多任务行为。
课程简介: The research challenge addressed in this paper is to devise effective techniques for identifying task-based sessions, i.e. sets of possibly non contiguous queries issued by the user of a Web Search Engine for carrying out a given task. In order to evaluate and compare different approaches, we built, by means of a manual labelling process, a ground-truth where the queries of a given query log have been grouped in tasks. Our analysis of this ground-truth shows that users tend to perform more than one task at the same time, since about 75% of the submitted queries involve a multi-tasking activity. We formally define the Task-based Session Discovery Problem (TSDP) as the problem of best approximating the manually annotated tasks, and we propose several variants of well known clustering algorithms, as well as a novel efficient heuristic algorithm, specifically tuned for solving the TSDP. These algorithms also exploit the collaborative knowledge collected by Wiktionary and Wikipedia for detecting query pairs that are not similar from a lexical content point of view, but actually semantically related. The proposed algorithms have been evaluated on the above ground-truth, and are shown to perform better than state-of-the-art approaches, because they effectively take into account the multi-tasking behavior of users.
关 键 词: 搜索引擎; 启发式算法; 计算机科学
课程来源: 视频讲座网
最后编审: 2020-01-06:chenxin
阅读次数: 49