
Near-Bayesian Exploration in Polynomial Time
课程网址: http://videolectures.net/icml09_kolter_nbept/  
主讲教师: J. Zico Kolter
开课单位: 卡内基梅隆大学
开课时间: 2009-08-26
课程语种: 英语
课程简介: We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to model-based RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution is intractable for all but very restricted cases. In this paper we present a simple algorithm, and prove that with high probability it is able to perform epsilon-close to the true (intractable) optimal Bayesian policy after some small (polynomial in quantities describing the system) number of time steps. The algorithm and analysis are motivated by the so-called PAC-MDP approach, and extend such results into the setting of Bayesian RL. In this setting, we show that we are able to achieve lower sample complexity bounds than existing PAC-MDP algorithms, while using exploration strategies that are much greedier than the (extremely cautious) exploration strategies used by these existing algorithms.
关 键 词: 计算机科学; 强化学习; 贝叶斯
课程来源: 视频讲座网
