0


缩小差距:最优POMDP的解决方案改进的界限

Closing the Gap: Improved Bounds on Optimal POMDP Solutions
课程网址: http://videolectures.net/icaps2011_poupart_solutions/  
主讲教师: Pascal Poupart
开课单位: 滑铁卢大学
开课时间: 2011-07-21
课程语种: 英语
中文简介:
POMDP算法近年来取得了重大进展,允许从业者找到解决日益严重的问题的良好解决方案。大多数方法(包括基于点的方法和策略迭代技术)通过细化最优值函数的下限来操作。几种方法(例如,HSVI2,SARSOP,基于网格的方法和在线前向搜索)也细化上限。然而,通过上限逼近最佳值函数在计算上是昂贵的,因此通常牺牲紧密度来提高效率(例如,锯齿近似)。在本文中,我们描述了一种有效计算更严格界限的新方法:i)对可达信念进行优先广度优先搜索,ii)使用增广POMDP传播上限改进和iii)使用精确线性编程(而不是锯齿)近似)用于上限插值。因此,我们可以更紧凑地表示边界,并显着减少几个基准问题上下界之间的差距。
课程简介: POMDP algorithms have made significant progress in recent years by allowing practitioners to find good solutions to increasingly large problems. Most approaches (including point-based and policy iteration techniques) operate by refining a lower bound of the optimal value function. Several approaches (e.g., HSVI2, SARSOP, grid-based approaches and online forward search) also refine an upper bound. However, approximating the optimal value function by an upper bound is computationally expensive and therefore tightness is often sacrificed to improve efficiency (e.g., sawtooth approximation). In this paper, we describe a new approach to efficiently compute tighter bounds by i) conducting a prioritized breadth first search over the reachable beliefs, ii) propagating upper bound improvements with an augmented POMDP and iii) using exact linear programming (instead of the sawtooth approximation) for upper bound interpolation. As a result, we can represent the bounds more compactly and significantly reduce the gap between upper and lower bounds on several benchmark problems.
关 键 词: 最优值函数; 优先广度; 线性规划
课程来源: 视频讲座网
最后编审: 2020-05-31:吴雨秋(课程编辑志愿者)
阅读次数: 68