广义随机最短路径问题的启发式搜索Heuristic Search for Generalized Stochastic Shortest Path MDPs |
|
课程网址: | http://videolectures.net/icaps2011_kolobov_heuristic/ |
主讲教师: | Andrey Kolobov |
开课单位: | 华盛顿大学 |
开课时间: | 2011-07-21 |
课程语种: | 英语 |
中文简介: | 迄今为止,关于解决无限时域MDP的有效方法的研究主要集中在折扣MDP和更一般的随机最短路径问题(SSP)上。这些是MDP,其中1)是最优值函数V *,它是Bellman方程的唯一解,2)最优策略是贪婪策略w.r.t. V *。本文的主要贡献是描述了一类新的MDP,它具有明确定义的最佳解决方案,不符合上述1或2。我们称之为新类广义随机最短路径(GSSP)问题。 GSSP允许比SSP更一般的奖励结构,并包含几种已建立的MDP类型,包括SSP,正向限制,负向和折扣奖励模型。虽然现有的有效启发式搜索算法(如LAO *和LRTDP)无法保证收敛到GSSP的最优值函数,但我们提出了一种新的基于启发式搜索的算法族,FRET(查找,修改,消除陷阱)。初步的实证评估表明,FRET比Value Iteration更有效地解决了GSSP问题。 |
课程简介: | Research in efficient methods for solving infinite-horizon MDPs has so far concentrated primarily on discounted MDPs and the more general stochastic shortest path problems (SSPs). These are MDPs with 1) an optimal value function V* that is the unique solution of Bellman equation and 2) optimal policies that are the greedy policies w.r.t. V*. This paper’s main contribution is the description of a new class of MDPs, that have well-defined optimal solutions that do not comply with either 1 or 2 above. We call our new class Generalized Stochastic Shortest Path (GSSP) problems. GSSP allows more general reward structure than SSP and subsumes several established MDP types including SSP, positive-bounded, negative, and discounted-reward models. While existing efficient heuristic search algorithms like LAO* and LRTDP are not guaranteed to converge to the optimal value function for GSSPs, we present a new heuristic-search-based family of algorithms, FRET (Find, Revise, Eliminate Traps). A preliminary empirical evaluation shows that FRET solves GSSPs much more efficiently than Value Iteration. |
关 键 词: | 随机最短路径问题; 最优的解决方案; 启发式搜索算法 |
课程来源: | 视频讲座网 |
最后编审: | 2020-06-27:yumf |
阅读次数: | 79 |