0


动态规划的非参数化方法

A Non-Parametric Approach to Dynamic Programming
课程网址: http://videolectures.net/nips2011_kroemer_programming/  
主讲教师: Oliver B Kroemer
开课单位: 达姆施塔特理工大学
开课时间: 2012-01-25
课程语种: 英语
中文简介:
本文研究了连续状态系统的政策评价问题。本文提出了一种非参数化的政策评价方法,该方法利用核密度估计来表示系统。该模型的值函数的真形式可以确定,并且可以用伽辽金方法计算。此外,我们还对一些著名的政策评估方法提出了统一的看法。特别地,我们证明了相同的伽辽金方法可用于推导最小二乘时间差分学习、核化时间差分学习和离散状态动态规划解,以及我们提出的方法。在这些算法的数值评估中,所提出的方法比其他方法表现得更好。
课程简介: In this paper, we consider the problem of policy evaluation for continuous-state systems. We present a non-parametric approach to policy evaluation, which uses kernel density estimation to represent the system. The true form of the value function for this model can be determined, and can be computed using Galerkin's method. Furthermore, we also present a unified view of several well-known policy evaluation methods. In particular, we show that the same Galerkin method can be used to derive Least-Squares Temporal Difference learning, Kernelized Temporal Difference learning, and a discrete-state Dynamic Programming solution, as well as our proposed method. In a numerical evaluation of these algorithms, the proposed approach performed better than the other methods.
关 键 词: 计算机科学; 机器学习; 强化学习
课程来源: 视频讲座网
最后编审: 2020-06-02:毛岱琦(课程编辑志愿者)
阅读次数: 43