学习障碍地形的动态移动技能Learning Dynamic Locomotion Skills for Terrains with Obstacles |
|
课程网址: | http://videolectures.net/rldm2015_van_de_panne_obstacles/ |
主讲教师: | Michiel van de Panne |
开课单位: | 不列颠哥伦比亚大学 |
开课时间: | 2015-07-28 |
课程语种: | 英语 |
中文简介: | 由于状态空间和动作空间都是高维和连续的,因此使用强化学习来培养铰接图形的运动技能具有挑战性。在这项工作中,我们学习了具有间隙、墙壁和台阶序列的地形上动态步态的控制策略。使用基于物理的21连杆平面狗和7连杆平面两足动物仿真演示了结果。我们的方法具有许多特点,包括:值函数和控制策略的非参数表示;使用批量正TD更新的值迭代;局部ε贪婪探索;以及为问题域定制的动作参数化。为了支持非参数表示,我们进一步优化了特定于任务的距离度量。使用epsilon贪婪探索和值迭代的重复迭代离线计算策略。最终的控制策略然后在新的地形上实时运行。我们评估了技能学习渠道的关键特征对最终绩效的影响。 |
课程简介: | Using reinforcement learning to develop motor skills for articulated figures is challenging because of state spaces and action spaces that are high dimensional and continuous. In this work, we learn control policies for dynamic gaits across terrains having sequences of gaps, walls, and steps. Results are demonstrated using physics-based simulations of a 21 link planar dog and a 7-link planar biped. Our approach is characterized by a number of features, including: non-parametric representation of the value function and the control policy; value iteration using batched positive-TD updates; localized epsilon-greedy exploration; and an action parameterization that is tailored for the problem domain. In support of the nonparametric representation, we further optimize for a task-specific distance metric. The policies are computed offline using repeated iterations of epsilon-greedy exploration and value iteration. The final control policies then run in real time over novel terrains. We evaluate the impact of the key features of our skill learning pipeline on the resulting performance. |
关 键 词: | 动态步态; 技能学习渠道; 状态空间 |
课程来源: | 视频讲座网 |
数据采集: | 2021-12-03:zkj |
最后编审: | 2021-12-03:zkj |
阅读次数: | 66 |