0


边信息的分段 - 平稳强盗问题

Piecewise-Stationary Bandit Problems with Side Information
课程网址: http://videolectures.net/icml09_yu_psbpsi/  
主讲教师: Jia Yuan Yu
开课单位: 麦吉尔大学
开课时间: 2009-08-26
课程语种: 英语
中文简介:
我们考虑一个顺序决策问题,其中奖励是由分段的静态分布产生的。但是,不同的奖励分配是未知的,可能会在未知的情况下发生变化。我们的方法对过去的奖励使用有限数量的观察,但不需要对变化频率的先验知识。尽管奖励过程具有对抗性质,但我们提供了一种算法,其遗憾,与基线有完全的分布知识。并且变化是O(k log(T)),其中k是直到时间T的变化次数。这与侧面观察不可用的情况形成对比,并且后悔至少是Ω(√T) 。我们还表明,我们的界限是一种自然的算法类型。该作品的早期版本出现在[YM09]中。
课程简介: We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may change at unknown instants. Our approach uses a limited number of side observations on past rewards, but does not require prior knowledge of the frequency of changes. In spite of the adversarial nature of the reward process, we provide an algorithm whose regret, with respect to the baseline with perfect knowledge of the distributions and the changes, is O(k log(T)), where k is the number of changes up to time T. This is in contrast to the case where side observations are not available, and where the regret is at least Ω(√T). We also show that our bound is tight for a natural class of algorithms. An earlier version of this work appears in [YM09].
关 键 词: 顺序决策; 静态分布; 有限数量
课程来源: 视频讲座网
最后编审: 2019-04-24:cwx
阅读次数: 28