0


强化学习的核函数值函数逼近

Kernelized Value Function Approximation for Reinforcement Learning
课程网址: http://videolectures.net/icml09_taylor_kvfa/  
主讲教师: Gavin Taylor
开课单位: 杜克大学
开课时间: 2009-08-26
课程语种: 英语
中文简介:
最近,针对强化学习的核心化方法的研究激增,试图将核心机器学习技术的好处带入强化学习。核心强化学习技术相当新,不同的作者用不同的假设和目标来探讨这个主题。既没有统一的观点,也没有理解不同方法的利弊。在本文中,我们提供了对强化学习的核化值函数逼近的不同方法的统一视图。我们证明,除了正则化的不同方法之外,核化LSTD(KLSTD)等效于基于模型的方法,其使用核化回归来找到近似的奖励和转移模型,并且高斯过程时间差异学习(GPTD)返回平均值功能等同于这些其他方法。我们还演示了基于模型的方法与早期的强化学习高斯过程(GPRL)之间的关系。最后,我们将Bellman误差分解为转换误差和奖励误差项的总和,并通过实验证明这种分解有助于选择正则化参数。
课程简介: A recent surge in research in kernelized approaches to reinforcement learning has sought to bring the benefits of kernelized machine learning techniques to reinforcement learning. Kernelized reinforcement learning techniques are fairly new and different authors have approached the topic with different assumptions and goals. Neither a unifying view nor an understanding of the pros and cons of different approaches has yet emerged. In this paper, we offer a unifying view of the different approaches to kernelized value function approximation for reinforcement learning. We show that, except for different approaches to regularization, Kernelized LSTD (KLSTD) is equivalent to a model based approach that uses kernelized regression to find an approximate reward and transition model, and that Gaussian Process Temporal Difference learning (GPTD) returns a mean value function that is equivalent to these other approaches. We also demonstrate the relationship between our model based approach and the earlier Gaussian Processes in Reinforcement Learning (GPRL). Finally, we decompose the Bellman error into the sum of transition error and reward error terms, and demonstrate through experiments that this decomposition can be helpful in choosing regularization parameters.
关 键 词: 强化学习; 机器学习技术; 函数逼近
课程来源: 视频讲座网
最后编审: 2020-06-01:汪洁炜(课程编辑志愿者)
阅读次数: 171