0


从土匪到专家:更多信息的价值

From Bandits to Experts : On the Value of More Information
课程网址: http://videolectures.net/explorationexploitation2011_shamir_bandi...  
主讲教师: Ohad Shamir
开课单位: 魏茨曼科学研究所
开课时间: 2011-07-25
课程语种: 英语
中文简介:
从专家学习和多臂武装是在线学习中研究的两种最常见的设置。虽然第一个设置假定在每轮结束时显示所有k个动作的表现,但强盗设置假定仅显示所选动作的表现,并且可证明的遗憾保证中相应的√k退化。在本文中,我们研究了一个在专家和匪徒设置之间插入的自然环境,其中选择一个动作也揭示了一些其他动作的表现的一些辅助信息。我们开发了具有可证实的遗憾保证的实用算法,以及部分匹配下界。遗憾取决于信息反馈结构的非平凡图理论性质,并且在遗憾最优性和计算效率之间进行了有趣的权衡。最后,我们讨论了剩下的许多悬而未决的问题。
课程简介: Learning from Experts and Multi-armed Bandits are two of the most common settings studied in online learning. Whereas the first setting assumes that the performance of all k actions are revealed at the end of each round, the bandit setting assumes that only the performance of the chosen action is revealed, with corresponding √k-degradation in the provable regret guarantee. In this paper, we study a natural setting which interpolates between the experts and the bandits setting, where choosing an action also reveals some side-information on the performance of some of the other actions. We develop practical algorithms with provable regret guarantees, as well as partially-matching lower bounds. The regret depends on non- trivial graph theoretic properties of the information feedback structure, and has an interesting trade-off between regret optimality and computational efficiency. We end by discussing some of the many open questions that remain.
关 键 词: 多臂武装; 在线学习; 强盗设置假定
课程来源: 视频讲座网
最后编审: 2019-04-14:lxf
阅读次数: 32