0


广义随机最短路径问题的启发式搜索

Heuristic Search for Generalized Stochastic Shortest Path MDPs
课程网址: http://videolectures.net/icaps2011_kolobov_heuristic/  
主讲教师: Andrey Kolobov
开课单位: 华盛顿大学
开课时间: 2011-07-21
课程语种: 英语
中文简介:
迄今为止,关于解决无限时域MDP的有效方法的研究主要集中在折扣MDP和更一般的随机最短路径问题(SSP)上。这些是MDP,其中1)是最优值函数V *,它是Bellman方程的唯一解,2)最优策略是贪婪策略w.r.t. V *。本文的主要贡献是描述了一类新的MDP,它具有明确定义的最佳解决方案,不符合上述1或2。我们称之为新类广义随机最短路径(GSSP)问题。 GSSP允许比SSP更一般的奖励结构,并包含几种已建立的MDP类型,包括SSP,正向限制,负向和折扣奖励模型。虽然现有的有效启发式搜索算法(如LAO *和LRTDP)无法保证收敛到GSSP的最优值函数,但我们提出了一种新的基于启发式搜索的算法族,FRET(查找,修改,消除陷阱)。初步的实证评估表明,FRET比Value Iteration更有效地解决了GSSP问题。
课程简介: Research in efficient methods for solving infinite-horizon MDPs has so far concentrated primarily on discounted MDPs and the more general stochastic shortest path problems (SSPs). These are MDPs with 1) an optimal value function V* that is the unique solution of Bellman equation and 2) optimal policies that are the greedy policies w.r.t. V*. This paper’s main contribution is the description of a new class of MDPs, that have well-defined optimal solutions that do not comply with either 1 or 2 above. We call our new class Generalized Stochastic Shortest Path (GSSP) problems. GSSP allows more general reward structure than SSP and subsumes several established MDP types including SSP, positive-bounded, negative, and discounted-reward models. While existing efficient heuristic search algorithms like LAO* and LRTDP are not guaranteed to converge to the optimal value function for GSSPs, we present a new heuristic-search-based family of algorithms, FRET (Find, Revise, Eliminate Traps). A preliminary empirical evaluation shows that FRET solves GSSPs much more efficiently than Value Iteration.
关 键 词: 随机最短路径问题; 最优的解决方案; 启发式搜索算法
课程来源: 视频讲座网
最后编审: 2020-06-27:yumf
阅读次数: 79