
Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs
课程网址: http://videolectures.net/icml08_doshi_rlw/  
主讲教师: Finale Doshi
开课单位: 剑桥大学
开课时间: 2008-08-06
课程语种: 英语
部分可观察马尔可夫决策过程 (pomdp) 成功地规划了域, 因为它们在增加代理知识的操作和增加代理回报的操作之间进行了优化交易。遗憾的是, 大多数 pmdp 定义的参数很多, 这些参数很难仅从域知识中指定。本文将 pomdp 模型参数作为 "模型不确定性中的附加隐藏状态" 了 pomdp, 并在这个较大的 pomdp 中建立了近似的规划算法。该近似值, 再加上模型定向的查询, 使规划师能够主动学习良好的策略。我们演示了我们在几个标准的 pomdp 问题上的方法。
课程简介: Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains because they optimally trade between actions that increase an agent's knowledge and actions that increase an agent's reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to specify only from domain knowledge. In this paper, we treat the POMDP model parameters as additional hidden state in a "model-uncertainty" POMDP and develop an approximate algorithm for planning in the this larger POMDP. The approximation, coupled with model-directed queries, allows the planner to actively learn good policies. We demonstrate our approach on several standard POMDP problems.
关 键 词: 马尔可夫决策过程; 贝叶斯学习; POMDP模型; 动作选择与贝叶斯风险
课程来源: 视频讲座网
最后编审: 2020-06-03:张荧(课程编辑志愿者)
阅读次数: 110