0


从查询日志中提取的语义关系

Extracting Semantic Relations from Query Logs
课程网址: http://videolectures.net/kdd07_baeza_yates_esr/  
主讲教师: Ricardo Baeza-Yates
开课单位: NTENT公司
开课时间: 2007-09-14
课程语种: 英语
中文简介:
在本文中,我们研究了超过二千万个查询的大型查询日志,目的是提取在提交查询和单击答案的用户的操作中隐式捕获的语义关系。以前的查询日志分析主要是使用查询而不是后面跟随的操作完成的。我们首先提出一种基于从查询单击二分图导出的图表在向量空间中表示查询的新方法。然后我们分析我们的查询日志生成的图表,显示它比以前的结果建议的稀疏程度要少,并且这些图表的几乎所有度量都遵循幂律,从而对搜索用户行为以及在人们在网络上想要的主题。我们介绍的表示允许推断查询之间有趣的语义关系。其次,我们对这些关系的质量进行了实验分析,表明其中大多数是相关的。最后,我们绘制了一个检测多视图URL的应用程序。
课程简介: In this paper we study a large query log of more than twenty million queries with the goal of extracting the semantic relations that are implicitly captured in the actions of users submitting queries and clicking answers. Previous query log analyses were mostly done with just the queries and not the actions that followed after them. We first propose a novel way to represent queries in a vector space based on a graph derived from the query-click bipartite graph. We then analyze the graph produced by our query log, showing that it is less sparse than previous results suggested, and that almost all the measures of these graphs follow power laws, shedding some light on the searching user behavior as well as on the distribution of topics that people want in the Web. The representation we introduce allows to infer interesting semantic relationships between queries. Second, we provide an experimental analysis on the quality of these relations, showing that most of them are relevant. Finally we sketch an application that detects multitopical URLs.
关 键 词: 向量空间查询; 幂律; 语义查询
课程来源: 视频讲座网
最后编审: 2020-06-29:yumf
阅读次数: 139