首页数学
0


具有相似信息的语境错误

Contextual Bandits with Similarity Information
课程网址: http://videolectures.net/colt2011_slivkins_bandits/  
主讲教师: Aleksandrs Slivkins
开课单位: 微软公司
开课时间: 2011-08-02
课程语种: 英语
中文简介:
在多臂强盗(MAB)问题中,在线算法产生一系列选择。在每一轮中,它从一组时间不变的选择中选择并获得与该替代方案相关的回报。虽然现在已经很好地理解了小型策略集的情况,但是最近的许多工作都集中在具有指数级或无限大型策略集的MAB问题上,其中需要假设额外的结构以使问题易于处理。特别是,最近的文献考虑了武器之间相似性的信息。我们在上下文匪徒的设置中考虑相似性信息,这是基本MAB问题的自然延伸,其中在每轮算法给出上下文之前 - 关于本轮中的收益的提示。通过在网页上放置广告来直接激发语境匪徒,这是赞助搜索中的关键问题之一。在上下文强盗设置中表示相似性信息的特别简单的方式是通过上下文 - 臂对之间的相似性距离,该相似性距离从相应的预期支付之间的差异之上界定。关于具有相似性的上下文匪的先前工作使用相似性空间的“均匀”分区,使得每个上下文 - 臂对由分区中的最近对来近似。基于“统一”分区的算法忽略了支付结构和上下文到达,这可能是浪费的。我们提出了基于自适应分区的算法,并利用“良性”支付和上下文到达而不牺牲最坏情况的性能。中心思想是在相似空间的高支付区域和上下文空间的流行区域中维持更精细的分区。我们的结果适用于其他几种设置,例如具有受限时间变化的MAB(Slivkins和Upfal,2008)和睡眠强盗(Kleinberg等,2008a)。
课程简介: In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now well-understood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem where before each round an algorithm is given the context – a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on webpages, one of the crucial problems in sponsored search. A particularly simple way to represent similarity information in the contextual bandit setting is via a similarity distance between the context-arm pairs which bounds from above the difference between the respective expected payoffs. Prior work on contextual bandits with similarity uses “uniform” partitions of the similarity space, so that each context-arm pair is approximated by the closest pair in the partition. Algorithms based on “uniform” partitions disregard the structure of the payoffs and the context arrivals, which is potentially wasteful. We present algorithms that are based on adaptive partitions, and take advantage of ”benign” payoffs and context arrivals without sacrificing the worst-case performance. The central idea is to maintain a finer partition in high-payoff regions of the similarity space and in popular regions of the context space. Our results apply to several other settings, e.g. MAB with constrained temporal change (Slivkins and Upfal, 2008) and sleeping bandits (Kleinberg et al., 2008a).
关 键 词: 相似信息; 数学问题; 数学分析; 策略算法
课程来源: 视频讲座网
最后编审: 2020-04-28:chenxin
阅读次数: 41