多武装土匪问题中方差估计的应用][Use of variance estimation in the multi-armed bandit problem]_MOOC(慕课)境外开放课程

首页 → 数理统计学

多武装土匪问题中方差估计的应用 Use of variance estimation in the multi-armed bandit problem


课程网址:	http://videolectures.net/otee06_audibert_uvema/
主讲教师:	Jean Yves Audibert
开课单位:	国立巴黎高等矿业学院
开课时间:	2007-02-25
课程语种:	英语
中文简介:	大多数决策问题的一个重要方面涉及开发（根据迄今为止获得的部分知识最佳地行动）与环境探索之间的适当平衡（为了优化当前知识和改进未来决策而次优地行动）。这种所谓的探索与开发困境的典型例子是多臂强盗问题，已经制定了许多战略。在这里，我们基于具有最高置信区间的臂的选择来研究策略，其中该约束考虑了不同臂的经验方差。之前发现的这种算法在一系列数值实验中表现优于同行。本文的主要贡献是对该算法的理论研究。我们在这里的贡献是双重的。首先，我们证明概率至少为1＆minus; B，在播放UCB算法的一个变体（称为B-UCB）之后的遗憾是一个常数上限，它与log（1 / B）线性比例，但是与n无关。我们还分析了一个更接近前面提出的算法的变体。我们证明了对该算法的预期遗憾的对数界限，并且认为这种界限有利于次优臂的方差。
课程简介:	An important aspect of most decision making problems concerns the appropriate balance between exploitation (acting optimally according to the partial knowledge acquired so far) and exploration of the environment (acting sub-optimally in order to refine the current knowledge and improve future decisions). A typical example of this so-called exploration versus exploitation dilemma is the multi-armed bandit problem, for which many strategies have been developed. Here we investigate policies based the choice of the arm having the highest upper-confidence bound, where the bound takes into account the empirical variance of the different arms. Such an algorithm was found earlier to outperform its peers in a series of numerical experiments. The main contribution of this paper is the theoretical investigation of this algorithm. Our contribution here is twofold. First, we prove that with probability at least 1 − B, the regret after n plays of a variant of the UCB algorithm (called B-UCB) is upper-bounded by a constant, that scales linearly with log(1/B), but which is independent from n. We also analyse a variant which is closer to the algorithm suggested earlier. We prove a logarithmic bound on the expected regret of this algorithm and argue that the bound scales favourably with the variance of the suboptimal arms.
关键词:	UCB算法; 决策; 算法
课程来源:	视频讲座网
最后编审:	2020-10-22：chenxin
阅读次数:	320

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2025 © 浙大宁波理工学院图书馆