首页物理学
   首页天文学
   首页地球科学
0


学习连续动作控制策略的二元动作搜索

Binary Action Search for Learning Continuous-Action Control Policies
课程网址: http://videolectures.net/icml09_pazis_bas/  
主讲教师: Jason Pazis
开课单位: 克里特理工大学
开课时间: 2009-08-26
课程语种: 英语
中文简介:
用于控制随机过程的强化学习方法通​​常假设一个小而离散的动作空间。虽然连续动作空间在现实世界问题中非常普遍,但在实践中仍然采用的最常见的方法是动作空间的粗略离散化。本文提出了一种称为二元动作搜索的新方法,通过根据在增强状态空间上定义的内部二进制策略对动作变量的值进行增量和减量修改来有效地搜索整个动作范围,从而实现连续动作策略。所提出的方法基本上将任何连续动作空间近似为任意分辨率,并且可以与用于学习连续动作策略的任何离散动作强化学习算法组合。二进制动作搜索消除了自适应动作修改的限制性修改步骤,并且不需要域中的时间动作局部性。我们的方法结合了两个众所周知的强化学习算法(最小二乘策略迭代和拟合Q迭代),并且它的使用和属性被彻底研究并在连续状态动作倒立摆,双积分器和汽车上的山域上进行了演示。
课程简介: Reinforcement Learning methods for controlling stochastic processes typically assume a small and discrete action space. While continuous action spaces are quite common in real-world problems, the most common approach still employed in practice is coarse discretization of the action space. This paper presents a novel method, called Binary Action Search, for realizing continuous-action policies by searching efficiently the entire action range through increment and decrement modifications to the values of the action variables according to an internal binary policy defined over an augmented state space. The proposed approach essentially approximates any continuous action space to arbitrary resolution and can be combined with any discrete-action reinforcement learning algorithm for learning continuous-action policies. Binary Action Search eliminates the restrictive modification steps of Adaptive Action Modification and requires no temporal action locality in the domain. Our approach is coupled with two well-known reinforcement learning algorithms (Least-Squares Policy Iteration and Fitted Q-Iteration) and its use and properties are thoroughly investigated and demonstrated on the continuous state-action Inverted Pendulum, Double Integrator, and Car on the Hill domains.
关 键 词: 随机过程; 离散化; 最小二乘策略迭代
课程来源: 视频讲座网
最后编审: 2021-04-09:yumf
阅读次数: 34