0


在逐步演变的世界中击败Bandits

Beating Bandits in Gradually Evolving Worlds
课程网址: http://videolectures.net/colt2013_jung_lee_bandits/  
主讲教师: Chia-Jung Lee
开课单位: 中国科学院
开课时间: 2013-08-09
课程语种: 英语
中文简介:
考虑在线凸优化问题,玩家必须根据一些凸损失函数迭代选择动作并遭受相应的损失,目标是使遗憾最小化。在全信息设置中,玩家选择自己的动作后可以观察到该回合的整个损失函数,而在Bandits设置中,玩家只能观察到该动作的损失值。设计这样的bandit算法显得很有挑战性,因为一般的凸损失函数目前获得的最佳后悔比全信息设置时高得多,而对于强凸损失函数,甚至有一个后悔下界,其指数级高于全信息设置时的后悔下界。为了减少遗憾,我们采用了一个轻松的两分Bandits设置,玩家可以在每轮中玩两种动作,并观察这两种动作的损失值。此外,我们认为损失函数参数化的偏差D,衡量他们发展多快,我们研究遗憾如何取决于D表明两点Bandits算法可以实现遗憾匹配的完整信息设置的D。更准确地说,凸损失函数,我们实现一个后悔的O (D−−√),而对于强凸损失函数,我们实现一个后悔的O (lnD),这远小于Ω(D−−√)下界在传统的Bandits。
课程简介: Consider the online convex optimization problem, in which a player has to choose actions iteratively and suffers corresponding losses according to some convex loss functions, and the goal is to minimize the regret. In the full-information setting, the player after choosing her action can observe the whole loss function in that round, while in the bandit setting, the only information the player can observe is the loss value of that action. Designing such bandit algorithms appears challenging, as the best regret currently achieved for general convex loss functions is much higher than that in the full-information setting, while for strongly convex loss functions, there is even a regret lower bound which is exponentially higher than that achieved in the full-information setting. To aim for smaller regrets, we adopt a relaxed two-point bandit setting in which the player can play two actions in each round and observe the loss values of those two actions. Moreover, we consider loss functions parameterized by their deviation D, which measures how fast they evolve, and we study how regrets depend on D. We show that two-point bandit algorithms can in fact achieve regrets matching those in the full-information setting in terms of D. More precisely, for convex loss functions, we achieve a regret of O(D−−√), while for strongly convex loss functions, we achieve a regret of O(lnD), which is much smaller than the Ω(D−−√) lower bound in the traditional bandit setting.
关 键 词: Bandits 算法; 在线凸优化; 凸损失函数; 函数参数化
课程来源: 视频讲座网公开课
最后编审: 2019-05-26:cwx
阅读次数: 47