0


路径积分控制问题的有限视野探索

Finite horizon exploration for path integral control problems
课程网址: http://videolectures.net/otee06_kappen_fhepi/  
主讲教师: Bert Kappen
开课单位: 内梅亨大学
开课时间: 2007-02-25
课程语种: 英语
中文简介:
我们最近开发了一种求解连续域中一类非线性随机控制问题的路径积分方法[1,2]。路径积分(PI)控制可应用于时间相关的有限时间任务(运动控制,代理之间的协调)和静态任务(其行为类似于折扣奖励强化学习)。在这种控制形式中,成本 - 能量或价值函数可以作为环境和奖励的函数(作为路径积分)明确地解决。因此,对于PI控制,不需要求解Bellman方程。路径积分的计算也可以是复杂的,但是可以使用来自统计物理学的方法和概念,例如蒙特卡罗采样或拉普拉斯近似来获得有效的近似。人们还可以将这种控制形式主义推广到共同解决任务的多个代理人。在这种情况下,代理人不仅需要通过时间协调他们的行动,还需要彼此协调。最近表明,该问题可以映射到图形模型推理问题上,并且可以使用连接树算法来解决。例如,可以使用数百个代理计算精确的控制解决方案,具体取决于成本函数的复杂性[3]。
课程简介: We have recently developed a path integral method for solving a class of non-linear stochastic control problems in the continuous domain [1, 2]. Path integral (PI) control can be applied for timedependent finite-horizon tasks (motor control, coordination between agents) and static tasks (which behave similar to discounted reward reinforcement learning). In this control formalism, the cost-togo or value function can be solved explicitly as a function of the environment and rewards (as a path integral). Thus, for PI control one does not need to solve the Bellman equation. The computation of the path integral can also be complex, but one can use methods and concepts from statistical physics, such as Monte Carlo sampling or the Laplace approximation to obtain efficient approximations. One can also generalize this control formalism to multiple agents that jointly solve a task. In this case the agents need to coordinate their actions not only through time, but also among each other. It was recently shown that the problem can be mapped on a graphical model inference problem and can be solved using the junction tree algorithm. Exact control solutions can be computed for instance with hundreds of agents, depending on the complexity of the cost function [3].
关 键 词: 优化方法; 操作研究; 路径积分
课程来源: 视频讲座网
最后编审: 2020-06-08:吴雨秋(课程编辑志愿者)
阅读次数: 69