首页数学
0


multi-armed bandits的有界后悔问题

Bounded regret in stochastic multi-armed bandits
课程网址: http://videolectures.net/colt2013_bubeck_regret/  
主讲教师: Sébastien Bubeck
开课单位: 普林斯顿大学
开课时间: 2013-08-09
课程语种: 英语
中文简介:
研究了multi-armed bandits问题当人知道μ的值(⋆)的一个最优的手段,作为好Δ积极积极下界最小的差距。我们提出了一种新的随机策略,在这种情况下,该策略可以获得一致有界的后悔。我们也证明几个下界,特别节目,有界的遗憾是不可能的如果一个人只知道Δ,订单1 /Δ和有界的遗憾是不可能的如果一个人只知道μ(⋆)。
课程简介: We study the stochastic multi-armed bandit problem when one knows the value μ(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap Δ. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows Δ, and bounded regret of order 1/Δ is not possible if one only knows μ(⋆).
关 键 词: 有界后悔; 最优化解; 随机策略
课程来源: 视频讲座网
最后编审: 2019-10-17:cwx
阅读次数: 79