0


从查询日志中提取语义关系

Extracting Semantic Relations from Query Logs
课程网址: http://videolectures.net/kdd07_baeza_yates_esr/  
主讲教师: Ricardo Baeza-Yates
开课单位: 恩特公司
开课时间: 2017-09-24
课程语种: 英语
中文简介:

在本文中,我们研究了一个包含两千万个查询的大型查询日志,其目的是提取在用户提交查询和单击答案的操作中隐式捕获的语义关系。先前的查询日志分析大多仅使用查询而不是查询之后的操作来完成。我们首先提出一种新颖的方式来表示向量空间中的查询,该方法基于从查询点击二分图得出的图。然后,我们分析了查询日志生成的图表,显示它比以前的结果建议的稀疏,并且这些图表的几乎所有度量均遵循幂定律,从而为搜索用户的行为以及人们在网络上想要的主题。我们引入的表示允许推断查询之间有趣的语义关系。其次,我们对这些关系的质量进行了实验分析,表明它们之间的关系最为密切。最后,我们绘制一个可检测多主题URL的应用程序。

课程简介: In this paper we study a large query log of more than twenty million queries with the goal of extracting the semantic relations that are implicitly captured in the actions of users submitting queries and clicking answers. Previous query log analyses were mostly done with just the queries and not the actions that followed after them. We first propose a novel way to represent queries in a vector space based on a graph derived from the query-click bipartite graph. We then analyze the graph produced by our query log, showing that it is less sparse than previous results suggested, and that almost all the measures of these graphs follow power laws, shedding some light on the searching user behavior as well as on the distribution of topics that people want in the Web. The representation we introduce allows to infer interesting semantic relationships between queries. Second, we provide an experimental analysis on the quality of these relations, showing that most of them are relevant. Finally we sketch an application that detects multitopical URLs.
关 键 词: 隐式捕获; 语义关系; 多主题URL; 应用程序
课程来源: 视频讲座网
数据采集: 2020-04-09:zhouxj
最后编审: 2020-05-25:cxin
阅读次数: 63