0


自由能和相对熵对偶:路径积分控制的连接和机器人学的应用

Free Energy and Relative Entropy Dualities: Connections to Path Integral Control and Applications to Robotics
课程网址: http://videolectures.net/cyberstat2012_theodorou_free_energy/  
主讲教师: Evangelos Theodorou
开课单位: 南加利福尼亚大学
开课时间: 2012-10-16
课程语种: 英语
中文简介:
虽然最佳控制和强化学习是学习和控制应用的基本框架,但迄今为止,它们在人形和仿生机器人的复杂性的高维控制系统中的应用基本上是不可能的。关键问题之一是基于经典值函数的方法由于价值函数近似的问题而在连续状态动作空间中遇到严重的限制。另外,探索高维状态动作空间的计算复杂性和时间快速超过实际可行性。作为替代方案,研究人员已经转变为基于轨迹的强化学习,这有利于适用于高维状态动作空间,从而牺牲全局最优性。基于模型的算法受到差分动态规划思想的启发,如果模型准确的话,已经证明了一些成功。基于模型自由轨迹的强化学习受到学习缓慢和需要调整许多开放参数的限制。最近,强化学习已经转向将来自随机最优最优控制和动态规划的经典技术与来自统计估计理论的学习技术以及通过Feynman Kac引理的SDE和PDE之间的联系相结合。在本次演讲中,我将讨论迭代案例中路径积分控制的理论发展和扩展,并提出连续状态行为空间中策略改进的算法。我将基于自由能和相对熵之间的基本关系提供信息理论解释和扩展。上述关系提供了不依赖贝尔曼原理的随机最优控制理论的另一种观点。我将证明所提算法的适用性,以控制和学习人形机器人,机械手和肌腱驱动机器人,并提出理论和应用方面的未来方向。
课程简介: While optimal control and reinforcement learning are fundamental frameworks for learning and control applications, their application to high dimensional control systems of the complexity of humanoid and biomimetic robots has largely been impossible so far. Among the key problems are that classical value function-based approaches run into severe limitations in continuous state-action spaces due to issues of value function approximation. Additionally, the computational complexity and time of exploring high dimensional state-action spaces quickly exceeds practical feasibility. As an alternative, researchers have turned into trajectory-based reinforcement learning, which sacri#ces global optimality in favor of being applicable to high-dimensional state-action spaces. Model-based algorithms, inspired by ideas of differential dynamic programming, have demonstrated some success if models are accurate. Model-free trajectory-based reinforcement learning has been limited by problems of slow learning and the need to tune many open parameters. Recently reinforcement learning has moved towards combining classical techniques from stochastic optimal optimal control and dynamic programming with learning techniques from statistical estimation theory and the connection between SDEs and PDEs via the Feynman-Kac Lemma. In this talk, I will discuss theoretical developments and extensions of path integral control to iterative cases and present algorithms for policy improvement in continuous state actions spaces. I will provide Information theoretic interpretations and extensions based on the fundamental relationship between free energy and relative entropy. The aforementioned relationship provides an alternative view of stochastic optimal control theory that does not rely on the Bellman principle. I will demonstrate the applicability of the proposed algorithms to control and learning of humanoid, manipulator and tendon driven robots and propose future directions in terms of theory and applications.
关 键 词: 机器人; 强化学习; 路径积分控制; 迭代案例
课程来源: 视频讲座网
最后编审: 2019-03-16:lxf
阅读次数: 130