0


多项式时间内的近贝叶斯探测

Near-Bayesian Exploration in Polynomial Time
课程网址: http://videolectures.net/icml09_kolter_nbept/  
主讲教师: J. Zico Kolter
开课单位: 卡内基梅隆大学
开课时间: 2009-08-26
课程语种: 英语
中文简介:
我们考虑了强化学习中的探索/开发问题。基于模型的RL的贝叶斯方法通过考虑对可能模型的分布,并采取行动最大化预期回报,为这个问题提供了一个优雅的解决方案;不幸的是,贝叶斯解决方案对于所有的情况都是难以解决的,但非常有限。本文提出了一种简单的算法,并证明了在经过少量(描述系统的数量多项式)时间步后,它很有可能执行接近真(难处理)最优贝叶斯策略的epsilon。算法和分析是由所谓的pac-mdp方法驱动的,并将这些结果扩展到贝叶斯RL的设置中。在这种情况下,我们表明,我们能够实现比现有的PAC-MDP算法更低的样本复杂度界限,同时使用比这些现有算法使用的(极其谨慎)勘探策略更贪婪的勘探策略。
课程简介: We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to model-based RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution is intractable for all but very restricted cases. In this paper we present a simple algorithm, and prove that with high probability it is able to perform epsilon-close to the true (intractable) optimal Bayesian policy after some small (polynomial in quantities describing the system) number of time steps. The algorithm and analysis are motivated by the so-called PAC-MDP approach, and extend such results into the setting of Bayesian RL. In this setting, we show that we are able to achieve lower sample complexity bounds than existing PAC-MDP algorithms, while using exploration strategies that are much greedier than the (extremely cautious) exploration strategies used by these existing algorithms.
关 键 词: 计算机科学; 强化学习; 贝叶斯
课程来源: 视频讲座网
最后编审: 2020-09-24:dingaq
阅读次数: 31