0


基于有限视界的马尔可夫决策过程中的拉格朗日对偶分解

Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes
课程网址: http://videolectures.net/ecmlpkdd2011_furmston_lagrange/  
主讲教师: Thomas Furmston
开课单位: 伦敦大学学院
开课时间: 2011-11-30
课程语种: 英语
中文简介:
使用平稳策略求解有限时域马尔可夫决策过程是一个计算上难以解决的问题。我们的动态双重分解方法使用拉格朗日对偶性将这个难题解耦成一系列易处理的子问题。由此产生的程序是对标准非静态马尔可夫决策过程求解器的直接修改,并给出了总预期回报的上限。该方法的经验性表明,它不仅是一种快速收敛算法,而且与标准规划算法(如政策梯度和下限程序,如期望最大化)相比,它也表现良好。
课程简介: Solving finite-horizon Markov Decision Processes with stationary policies is a computationally difficult problem. Our dynamic dual decomposition approach uses Lagrange duality to decouple this hard problem into a sequence of tractable sub-problems. The resulting procedure is a straightforward modification of standard non-stationary Markov Decision Process solvers and gives an upper-bound on the total expected reward. The empirical performance of the method suggests that not only is it a rapidly convergent algorithm, but that it also performs favourably compared to standard planning algorithms such as policy gradients and lower-bound procedures such as Expectation Maximisation.
关 键 词: 平稳策略; 马尔可夫决策; 双重分解
课程来源: 视频讲座网
最后编审: 2020-10-22:chenxin
阅读次数: 114