0


在计划中作为局部可观测域推理的规划中局部最优分析与逃避

Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains
课程网址: http://videolectures.net/ecmlpkdd2011_poupart_domains/  
主讲教师: Pascal Poupart
开课单位: 滑铁卢大学
开课时间: 2011-11-30
课程语种: 英语
中文简介:
作为推理的规划最近成为用于具有离散和连续变量的完全和部分可观察域中的单个和多个代理系统的决策理论规划和强化学习的通用方法。由于当状态是部分可观察的时,作为推理的计划基本上解决了非凸优化问题,因此需要开发能够稳健地逃避局部最优的技术。我们研究了通过期望最大化(EM)优化的单个代理部分可观察马尔可夫决策过程(POMDP)中的有限状态控制器的局部最优。我们展示EM会聚到控制器,这些控制器相对于一步前瞻而言是最佳的。为了逃避局部最优,我们提出了两种算法:第一种算法将节点添加到控制器以确保相对于多步前瞻的最优性,而第二种算法以贪婪的方式分割节点以提高奖励可能性。这些方法在基准问题上凭经验证明。
课程简介: Planning as inference recently emerged as a versatile approach to decision-theoretic planning and reinforcement learning for single and multi-agent systems in fully and partially observable domains with discrete and continuous variables. Since planning as inference essentially tackles a non-convex optimization problem when the states are partially observable, there is a need to develop techniques that can robustly escape local optima. We investigate the local optima of finite state controllers in single agent partially observable Markov decision processes (POMDPs) that are optimized by expectation maximization (EM). We show that EM converges to controllers that are optimal with respect to a one-step look ahead. To escape local optima, we propose two algorithms: the first one adds nodes to the controller to ensure optimality with respect to a multi-step look ahead, while the second one splits nodes in a greedy fashion to improve reward likelihood. The approaches are demonstrated empirically on benchmark problems.
关 键 词: 连续变量; 离散; 非凸优化问题
课程来源: 视频讲座网
最后编审: 2019-04-03:lxf
阅读次数: 56