广义随机最短路径问题的启发式搜索][Heuristic Search for Generalized Stochastic Shortest Path MDPs]_MOOC(慕课)境外开放课程

首页 → 计算机科学技术

广义随机最短路径问题的启发式搜索 Heuristic Search for Generalized Stochastic Shortest Path MDPs


课程网址:	http://videolectures.net/icaps2011_kolobov_heuristic/
主讲教师:	Andrey Kolobov
开课单位:	华盛顿大学
开课时间:	2011-07-21
课程语种:	英语
中文简介:	迄今为止，关于解决无限时域MDP的有效方法的研究主要集中在折扣MDP和更一般的随机最短路径问题（SSP）上。这些是MDP，其中1）是最优值函数V ，它是Bellman方程的唯一解，2）最优策略是贪婪策略w.r.t. V 。本文的主要贡献是描述了一类新的MDP，它具有明确定义的最佳解决方案，不符合上述1或2。我们称之为新类广义随机最短路径（GSSP）问题。 GSSP允许比SSP更一般的奖励结构，并包含几种已建立的MDP类型，包括SSP，正向限制，负向和折扣奖励模型。虽然现有的有效启发式搜索算法（如LAO *和LRTDP）无法保证收敛到GSSP的最优值函数，但我们提出了一种新的基于启发式搜索的算法族，FRET（查找，修改，消除陷阱）。初步的实证评估表明，FRET比Value Iteration更有效地解决了GSSP问题。
课程简介:	Research in efficient methods for solving infinite-horizon MDPs has so far concentrated primarily on discounted MDPs and the more general stochastic shortest path problems (SSPs). These are MDPs with 1) an optimal value function V* that is the unique solution of Bellman equation and 2) optimal policies that are the greedy policies w.r.t. V. This paper’s main contribution is the description of a new class of MDPs, that have well-defined optimal solutions that do not comply with either 1 or 2 above. We call our new class Generalized Stochastic Shortest Path (GSSP) problems. GSSP allows more general reward structure than SSP and subsumes several established MDP types including SSP, positive-bounded, negative, and discounted-reward models. While existing efficient heuristic search algorithms like LAO and LRTDP are not guaranteed to converge to the optimal value function for GSSPs, we present a new heuristic-search-based family of algorithms, FRET (Find, Revise, Eliminate Traps). A preliminary empirical evaluation shows that FRET solves GSSPs much more efficiently than Value Iteration.
关键词:	随机最短路径问题; 最优的解决方案; 启发式搜索算法
课程来源:	视频讲座网
最后编审:	2020-06-27：yumf
阅读次数:	106

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2025 © 浙大宁波理工学院图书馆