具有侧面观测的分段 - 平稳强盗问题][Piecewise-Stationary Bandit Problems with Side Observations]_MOOC(慕课)境外开放课程

   首页 → 数学
   首页 → 信息科学与系统科学
   首页 → 地球科学

具有侧面观测的分段 - 平稳强盗问题 Piecewise-Stationary Bandit Problems with Side Observations


课程网址:	http://videolectures.net/icml09_yu_psbp/
主讲教师:	Jia Yuan Yu
开课单位:	麦吉尔大学
开课时间:	2009-08-26
课程语种:	英语
中文简介:	我们考虑一个顺序决策问题，其中奖励由分段静止分布产生。然而，不同的奖励分布是未知的，并且可能在未知时刻发生变化。我们的方法对过去的奖励使用有限数量的侧面观察，但不需要事先了解变化的频率。尽管奖励过程具有对抗性质，但我们提供了一种算法，其遗传方式是基于对分布和变化的完全了解的基线，是O（k \ log（T），其中k是变化的数量直到时间T.这与侧面观察不可用的情况形成对比，并且后悔至少是欧米茄（sqrt {T}）。
课程简介:	We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may change at unknown instants. Our approach uses a limited number of side observations on past rewards, but does not require prior knowledge of the frequency of changes. In spite of the adversarial nature of the reward process, we provide an algorithm whose regret, with respect to the baseline with perfect knowledge of the distributions and the changes, is O(k \log(T), where k is the number of changes up to time T. This is in contrast to the case where side observations are not available, and where the regret is at least Omega(sqrt{T}).
关键词:	顺序决策; 分段静止; 奖励分布
课程来源:	视频讲座网
最后编审:	2019-04-25：cwx
阅读次数:	83

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2026 © 浙大宁波理工学院图书馆