多项式时间内的近贝叶斯探测][Near-Bayesian Exploration in Polynomial Time]_MOOC(慕课)境外开放课程

首页 → 计算机应用

多项式时间内的近贝叶斯探测 Near-Bayesian Exploration in Polynomial Time


课程网址:	http://videolectures.net/icml09_kolter_nbept/
主讲教师:	J. Zico Kolter
开课单位:	卡内基梅隆大学
开课时间:	2009-08-26
课程语种:	英语
中文简介:	我们考虑了强化学习中的探索/开发问题。基于模型的RL的贝叶斯方法通过考虑对可能模型的分布，并采取行动最大化预期回报，为这个问题提供了一个优雅的解决方案；不幸的是，贝叶斯解决方案对于所有的情况都是难以解决的，但非常有限。本文提出了一种简单的算法，并证明了在经过少量（描述系统的数量多项式）时间步后，它很有可能执行接近真（难处理）最优贝叶斯策略的epsilon。算法和分析是由所谓的pac-mdp方法驱动的，并将这些结果扩展到贝叶斯RL的设置中。在这种情况下，我们表明，我们能够实现比现有的PAC-MDP算法更低的样本复杂度界限，同时使用比这些现有算法使用的（极其谨慎）勘探策略更贪婪的勘探策略。
课程简介:	We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to model-based RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution is intractable for all but very restricted cases. In this paper we present a simple algorithm, and prove that with high probability it is able to perform epsilon-close to the true (intractable) optimal Bayesian policy after some small (polynomial in quantities describing the system) number of time steps. The algorithm and analysis are motivated by the so-called PAC-MDP approach, and extend such results into the setting of Bayesian RL. In this setting, we show that we are able to achieve lower sample complexity bounds than existing PAC-MDP algorithms, while using exploration strategies that are much greedier than the (extremely cautious) exploration strategies used by these existing algorithms.
关键词:	计算机科学; 强化学习; 贝叶斯
课程来源:	视频讲座网
最后编审:	2020-09-24：dingaq
阅读次数:	31

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆306室
系统研发：图书馆321室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2024 © 浙大宁波理工学院图书馆