从土匪到专家：更多信息的价值][From Bandits to Experts : On the Value of More Information]_MOOC(慕课)境外开放课程

   首页 → 应用数学
   首页 → 数学分析
   首页 → 信息科学与系统科学

从土匪到专家：更多信息的价值 From Bandits to Experts : On the Value of More Information


课程网址:	http://videolectures.net/explorationexploitation2011_shamir_bandi...
主讲教师:	Ohad Shamir
开课单位:	魏茨曼科学研究所
开课时间:	2011-07-25
课程语种:	英语
中文简介:	从专家学习和多臂武装是在线学习中研究的两种最常见的设置。虽然第一个设置假定在每轮结束时显示所有k个动作的表现，但强盗设置假定仅显示所选动作的表现，并且可证明的遗憾保证中相应的√k退化。在本文中，我们研究了一个在专家和匪徒设置之间插入的自然环境，其中选择一个动作也揭示了一些其他动作的表现的一些辅助信息。我们开发了具有可证实的遗憾保证的实用算法，以及部分匹配下界。遗憾取决于信息反馈结构的非平凡图理论性质，并且在遗憾最优性和计算效率之间进行了有趣的权衡。最后，我们讨论了剩下的许多悬而未决的问题。
课程简介:	Learning from Experts and Multi-armed Bandits are two of the most common settings studied in online learning. Whereas the first setting assumes that the performance of all k actions are revealed at the end of each round, the bandit setting assumes that only the performance of the chosen action is revealed, with corresponding √k-degradation in the provable regret guarantee. In this paper, we study a natural setting which interpolates between the experts and the bandits setting, where choosing an action also reveals some side-information on the performance of some of the other actions. We develop practical algorithms with provable regret guarantees, as well as partially-matching lower bounds. The regret depends on non- trivial graph theoretic properties of the information feedback structure, and has an interesting trade-off between regret optimality and computational efficiency. We end by discussing some of the many open questions that remain.
关键词:	多臂武装; 在线学习; 强盗设置假定
课程来源:	视频讲座网
最后编审:	2019-04-14：lxf
阅读次数:	32

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆306室
系统研发：图书馆321室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2024 © 浙大宁波理工学院图书馆