0


策略搜索:方法和应用

Policy Search: Methods and Applications
课程网址: http://videolectures.net/icml2015_neumann_peters_policy_search/  
主讲教师: Jan Peters; Gerhard Neumann
开课单位: 达姆施塔特工业大学计算机科学系
开课时间: 2015-12-05
课程语种: 英语
中文简介:
策略搜索是强化学习的一个子领域,其重点是为给定的策略参数化找到好的参数。它非常适合机器人技术,因为它可以处理高维状态和动作空间,这是机器人学习的主要挑战之一。我们回顾了最近在机器人学习中无模型和基于模型的策略搜索的成功。无模型策略搜索是一种基于采样轨迹学习策略的通用方法。我们根据策略评估策略、策略更新策略和探索策略对无模型方法进行了分类,并对现有算法进行了统一的评价。学习策略通常比学习准确的前向模型更容易,因此,在实践中更频繁地使用无模型方法。然而,对于每一个采样的轨迹,都需要与机器人进行交互,这在实践中是非常耗时和具有挑战性的。基于模型的策略搜索通过首先从数据中学习机器人动力学模拟器来解决这个问题。随后,模拟器生成用于策略学习的轨迹。对于无模型和基于模型的策略搜索方法,我们回顾了它们各自的性质及其在机器人系统中的适用性。
课程简介: Policy search is a subfield in reinforcement learning which focuses on finding good parameters for a given policy parametrization. It is well suited for robotics as it can cope with high-dimensional state and action spaces, one of the main challenges in robot learning. We review recent successes of both model-free and model-based policy search in robot learning. Model-free policy search is a general approach to learn policies based on sampled trajectories. We classify model-free methods based on their policy evaluation strategy, policy update strategy, and exploration strategy and present a unified view on existing algorithms. Learning a policy is often easier than learning an accurate forward model, and, hence, model-free methods are more frequently used in practice. How- ever, for each sampled trajectory, it is necessary to interact with the robot, which can be time consuming and challenging in practice. Model-based policy search addresses this problem by first learning a simulator of the robot’s dynamics from data. Subsequently, the simulator generates trajectories that are used for policy learning. For both model- free and model-based policy search methods, we review their respective properties and their applicability to robotic systems.
关 键 词: 策略搜索; 强化学习; 高维状态; 策略评估
课程来源: 视频讲座网
数据采集: 2023-04-24:chenxin01
最后编审: 2023-05-18:chenxin01
阅读次数: 32