0


强盗设置的战略考量-在线广告拍卖的真实价格

Strategic Considerations in Bandit Settings - The Price of Truthfulness in Online Ad Auctions
课程网址: http://videolectures.net/icml09_kakade_itscbs/  
主讲教师: Kakade Sham M
开课单位: 微软公司
开课时间: 2009-08-26
课程语种: 英语
中文简介:
近年来,在机器学习和经济学交叉点的研究蓬勃发展,因为人们认识到许多技术系统(如互联网)在被视为经济系统而不仅仅是技术系统时更容易被理解和管理。特别是,最近有很多关于理解学习算法和模型的工作都是在必须考虑参与者的战略考虑的环境中进行的(例如,垃圾邮件检测)。本次演讲将探讨在线广告拍卖设置中的更广泛问题(尤其是“点击付费”拍卖,即广告商仅在点击广告时收取这些轮次的费用)。设计这样的拍卖面临着典型的探索/利用困境:当收集有关广告商点击率的信息时,这种机制可能会导致收入松动;然而,收集到的信息在未来可能会证明对于更有利可图的分配是有价值的。从这个意义上说,这些机制是使用多武器土匪技术设计的主要候选机制。然而,多臂强盗算法的幼稚应用不会考虑玩家的战略考虑——玩家可能会通过操纵出价(决定拍卖收入)来最大化自己的效用。因此,我们认为拍卖是真实的自然限制。这部作品尖锐地描述了在这种真实的限制下,后悔是可以实现的。有趣的是,我们发现这个限制对可实现的遗憾施加了统计限制——它是theta(t^2/3),而对于传统的Bandit算法(没有这个真实的限制),可实现的遗憾是o(t^1/2)(其中t是轮数)。我们称之为额外的t^1/6因素,即真实性的价格。
课程简介: Research at the intersection of machine learning and economics has flourished in recent years due to the realization that many technological systems (such as the internet) are better understood and managed when they are viewed as economic systems rather than just merely technological ones. In particular, there is much recent work on understanding learning algorithms and models in settings where the strategic considerations of the participants must be taken into account (e.g. spam detection). This talk will examine the this broader issue in the setting of online ad-auctions (in particular "pay-per-click" auctions, where advertisers are charged only those rounds when their ad is clicked on). Designing such an auction faces the classic explore/exploit dilemma: while gathering information about the "click-through-rates" of advertisers, the mechanism may loose revenue; however, this gleaned information may prove valuable in the future for a more profitable allocation. In this sense, such mechanisms are prime candidates to be designed using multi-armed bandit techniques. However, a naive application of multi-armed bandit algorithms would not take into account the strategic considerations of the players - players might manipulate their bids (which determine the auction's revenue) in a way as to maximize their own utility. Hence, we consider the natural restriction that the auction be truthful. This work sharply characterizes what regret is achievable, under this truthful restriction. Interestingly, we show that this restriction imposes statistical limits on the achievable regret - that it is Theta(T^2/3), while for traditional bandit algorithms (without this truthful restriction) the achievable regret is O(T^1/2) (where T is the number of rounds). We term the extra T^1/6 factor, the "price of truthfulness".
关 键 词: 机器学习; 经济学; 强盗技术
课程来源: 视频讲座网
最后编审: 2020-04-15:chenxin
阅读次数: 54