0


一些最近的强盗结果

Some Recent Bandit Results
课程网址: http://videolectures.net/onlinelearning2012_cesa_bianchi_bandit_r...  
主讲教师: Nicolò Cesa-Bianchi
开课单位: 米兰大学
开课时间: 2013-05-28
课程语种: 英语
中文简介:
对多臂匪徒的研究正朝着多个方向扩展。在本次演讲中,我将介绍一些反映当前各种方法的最新结果。第一部分将专门用于分析随机强盗,当奖励分布有重尾时,阻止使用标准统计估计。接下来,我将考虑具有转换成本的强盗,并在奖励过程的不同假设下显示新的上限和下限。演讲的最后部分将集中讨论组合强盗,其中包括几个有趣的特殊情况,如排名和多次拉动。在这个设置中,我将讨论三种主要的现有算法方法(Mirror Descent,Exp2,FPL)的优点和局限性。联合工作:Sebastien Bubeck,Ofer Dekel,Elad Hazan,Sham Kakade,Gabor Lugosi,Ohad Shamir
课程简介: Research on multi-armed bandits is expanding in several directions. In this talk I will cover a number of recent results which reflect the variety of current approaches. The first part will be devoted to the analysis of stochastic bandits when reward distributions have heavy tails, preventing the use of standard statistical estimators. Next, I will consider bandits with switching costs and show new upper and lower bounds under different assumptions on the reward processes. The last part ofthe talk will focus on combinatorial bandits, which include several interesting special cases like ranking and multiple pulls. In this setting I will discuss merits and limitations of the three main existing algorithmic approaches (Mirror Descent, Exp2, FPL). Joint work with: Sebastien Bubeck, Ofer Dekel, Elad Hazan, Sham Kakade, Gabor Lugosi, Ohad Shamir
关 键 词: 判定支持; Exp2; FPL
课程来源: 视频讲座网
最后编审: 2020-05-31:吴雨秋(课程编辑志愿者)
阅读次数: 19