0


具有侧面观测的分段 - 平稳强盗问题

Piecewise-Stationary Bandit Problems with Side Observations
课程网址: http://videolectures.net/icml09_yu_psbp/  
主讲教师: Jia Yuan Yu
开课单位: 麦吉尔大学
开课时间: 2009-08-26
课程语种: 英语
中文简介:
我们考虑一个顺序决策问题,其中奖励由分段静止分布产生。然而,不同的奖励分布是未知的,并且可能在未知时刻发生变化。我们的方法对过去的奖励使用有限数量的侧面观察,但不需要事先了解变化的频率。尽管奖励过程具有对抗性质,但我们提供了一种算法,其遗传方式是基于对分布和变化的完全了解的基线,是O(k \ log(T),其中k是变化的数量直到时间T.这与侧面观察不可用的情况形成对比,并且后悔至少是欧米茄(sqrt {T})。
课程简介: We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may change at unknown instants. Our approach uses a limited number of side observations on past rewards, but does not require prior knowledge of the frequency of changes. In spite of the adversarial nature of the reward process, we provide an algorithm whose regret, with respect to the baseline with perfect knowledge of the distributions and the changes, is O(k \log(T), where k is the number of changes up to time T. This is in contrast to the case where side observations are not available, and where the regret is at least Omega(sqrt{T}).
关 键 词: 顺序决策; 分段静止; 奖励分布
课程来源: 视频讲座网
最后编审: 2019-04-25:cwx
阅读次数: 31