0


预计算搜索功能可实现快速准确的查询分类

Precomputing Search Features for Fast and Accurate Query Classification
课程网址: http://videolectures.net/wsdm2010_konig_psf/  
主讲教师: Arnd Christian König
开课单位: 微软公司
开课时间: 信息不详。欢迎您在右侧留言补充。
课程语种: 英语
中文简介:
查询意图分类对于网络搜索和广告是至关重要的。众所周知,这是一项具有挑战性的工作,因为web查询平均包含的单词少于三个,因此几乎不能为基于此的分类决策提供任何信号。同时,搜索查询中使用的词汇量很大:因此,基于单词出现的分类器必须处理非常稀疏的特征空间,并且常常需要大量的训练数据。以前解决特征稀疏性问题的努力使用从针对web搜索引擎发出要分类的查询所获得的结果计算出的特征来扩大特征空间。然而,这些方法会导致高延迟,在实践中是不可接受的。在本文中,我们提出了一种新的特征类,它在不存在高延迟的情况下实现了基于搜索的特征的优点。这些方法利用了查询关键字和应用于搜索结果中的文档的标记之间的共现性,从而显著提高了web查询分类的准确性。通过预计算适当选择的关键字组合的标签关联,我们能够在线生成低延迟和低内存需求的特性。我们在商业搜索环境中使用大量真实的web查询来评估我们的方法的准确性。
课程简介: Query intent classification is crucial for web search and advertising. It is known to be challenging because web queries contain less than three words on average, and so provide little signal to base classification decisions on. At the same time, the vocabulary used in search queries is vast: thus, classifiers based on word-occurrence have to deal with a very sparse feature space, and often require large amounts of training data. Prior efforts to address the issue of feature sparseness augmented the feature space using features computed from the results obtained by issuing the query to be classified against a web search engine. However, these approaches induce high latency, making them unacceptable in practice. In this paper, we propose a new class of features that realizes the benefit of search-based features without high latency. These leverage co-occurrence between the query keywords and tags applied to documents in search results, resulting in a significant boost to web query classification accuracy. By precomputing the tag incidence for a suitably chosen set of keyword-combinations, we are able to generate the features online with low latency and memory requirements. We evaluate the accuracy of our approach using a large corpus of real web queries in the context of commercial search.
关 键 词: 语义网; 注释; 计算机科学; 网页搜索
课程来源: 视频讲座网
最后编审: 2020-06-08:吴雨秋(课程编辑志愿者)
阅读次数: 36