0


基于竞价景观的搜索拍卖优化查询聚类

Query Clustering based on Bid Landscape for Sponsored Search Auction Optimization
课程网址: http://videolectures.net/kdd2013_chen_auction_optimization/  
主讲教师: Ye Chen
开课单位: 微软公司
开课时间: 2013-09-27
课程语种: 英语
中文简介:

在赞助的搜索拍卖中,拍卖师通过设置许多拍卖参数(例如为拍卖优化任务准备的底价)来操纵市场。可以为每个单独的关键字设置拍卖参数,但是由于关键字的数量为数百万个,因此优化问题变得棘手。为了降低维数并很好地概括,人们希望将关键字或查询聚类为有意义的组,并在关键字聚类级别设置参数。为了优化拍卖,相对于广告商对它们的估价,关键字应被视为可互换商品,用出价分布或风景表示。因此,用于拍卖优化的聚类关键字应基于其出价分布。在本文中,我们提出了聚类概率分布的形式化方法,并将其应用于查询聚类,其中每个查询均表示为点击率(CTR)加权出价和失真的概率密度,并通过KL散度来度量。我们首先导出用于聚类高斯密度的k均值变体,其具有闭合形式的KL发散。然后,我们开发一种用于对高斯混合密度进行聚类的算法,该算法可以对单个高斯进行概括,并且通常是针对现实世界数据的更现实的参数假设。高斯混合密度之间的KL散度不再在分析上易于处理;因此,我们推导了一种变分EM算法,该算法使簇KL散度内的总数的上限最小。聚类算法已成功部署到生产中,极大地提高了收入,并提高了现有生产系统的点击率。尽管受到查询聚类的特定设置的启发,但所提出的聚类方法通常适用于许多实际应用,在该应用中,与经典k均值一样,示例在欧氏空间中比有限维特征向量具有更好的分布特征。

课程简介: In sponsored search auctions, the auctioneer operates the marketplace by setting a number of auction parameters such as reserve prices for the task of auction optimization. The auction parameters may be set for each individual keyword, but the optimization problem becomes intractable since the number of keywords is in the millions. To reduce the dimensionality and generalize well, one wishes to cluster keywords or queries into meaningful groups, and set parameters at the keyword-cluster level. For auction optimization, keywords shall be deemed as interchangeable commodities with respect to their valuations from advertisers, represented as bid distributions or landscapes. Clustering keywords for auction optimization shall thus be based on their bid distributions. In this paper we present a formalism of clustering probability distributions, and its application to query clustering where each query is represented as a probability density of click-through rate (CTR) weighted bid and distortion is measured by KL divergence. We first derive a k-means variant for clustering Gaussian densities, which have a closed-form KL divergence. We then develop an algorithm for clustering Gaussian mixture densities, which generalize a single Gaussian and are typically a more realistic parametric assumption for real-world data. The KL divergence between Gaussian mixture densities is no longer analytically tractable; hence we derive a variational EM algorithm that minimizes an upper bound of the total within-cluster KL divergence. The clustering algorithm has been deployed successfully into production, yielding significant improvement in revenue and clicks over the existing production system. While motivated by the specific setting of query clustering, the proposed clustering method is generally applicable to many real-world applications where an example is better characterized by a distribution than a finite-dimensional feature vector in Euclidean space as in the classical k-means.
关 键 词: 变分算法; 向量分布
课程来源: 视频讲座网
数据采集: 2020-11-24:zyk
最后编审: 2020-11-24:zyk
阅读次数: 35