0


贝叶斯强化学习的在线核选择

Online Kernel Selection for Bayesian Reinforcement Learning
课程网址: http://videolectures.net/icml08_reisinger_oks/  
主讲教师: Joseph Reisinger
开课单位: 德克萨斯大学
开课时间: 2008-08-04
课程语种: 英语
中文简介:
基于核的贝叶斯增强学习(RL)方法,如高斯过程时间差分(GPTD),由于其严格地处理了值函数中的不确定性,使其易于指定先验知识,因而特别有前途。然而,先验分布的选择对学习主体的经验性能有着显著的影响,而将已有的先验模型选择方法扩展到在线设置方面的工作却很少。本文发展了一种基于人群搜索的GPTD在线模型选择方法——替换内核RL。在多个RL域上,将RL替换为标准的GPTD和tile编码进行了比较,结果表明,在许多不同的内核族中,RL替换后的渐近性能明显优于标准的GPTD和tile编码。此外,得到的内核捕获了一个直观有用的先验状态协方差概念,尽管如此,这可能很难手动捕获。
课程简介: Kernel-based Bayesian methods for Reinforcement Learning (RL) such as Gaussian Process Temporal Difference (GPTD) are particularly promising because they rigorously treat uncertainty in the value function and make it easy to specify prior knowledge. However, the choice of prior distribution significantly affects the empirical performance of the learning agent, and little work has been done extending existing methods for prior model selection to the online setting. This paper develops Replacing-Kernel RL, an online model selection method for GPTD using population-based search. Replacing-Kernel RL is compared to standard GPTD and tile-coding on several RL domains, and is shown to yield significantly better asymptotic performance for many different kernel families. Furthermore, the resulting kernels capture an intuitively useful notion of prior state covariance that may nevertheless be difficult to capture manually.
关 键 词: 强化学习; 贝叶斯方法; 先验分布; 协方差
课程来源: 视频讲座网
最后编审: 2020-06-08:yumf
阅读次数: 35