无模型强化学习作为混合学习Model-Free Reinforcement Learning as Mixture Learning |
|
课程网址: | http://videolectures.net/icml09_vlassis_mfr/ |
主讲教师: | Nikos Vlassis |
开课单位: | 克里特理工大学 |
开课时间: | 2009-08-26 |
课程语种: | 英语 |
中文简介: | 我们将模型自由强化学习作为通过抽样最大化概率混合模型的可能性的问题,解决了现有和夜间情况。我们描述了用于似然最大化的随机逼近算法,在表格的情况下,它等效于非自举乐观政策迭代算法,如Sarsa(1),可以应用于MDP和POMDP。在理论方面,通过将提出的随机EM算法与乐观策略迭代算法族相关联,我们提供了允许设计和分析该族中算法的新工具。在实际方面,关于POMDP问题的初步实验表明了令人鼓舞的结果。 |
课程简介: | We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the innite and nite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is equivalent to a non-bootstrapping optimistic policy iteration algorithm like Sarsa(1) that can be applied both in MDPs and POMDPs. On the theoretical side, by relating the proposed stochastic EM algorithm to the family of optimistic policy iteration algorithms, we provide new tools that permit the design and analysis of algorithms in that family. On the practical side, preliminary experiments on a POMDP problem demonstrated encouraging results. |
关 键 词: | 自由强化学习; 抽样; 最大化概率 |
课程来源: | 视频讲座网 |
最后编审: | 2021-04-09:yumf |
阅读次数: | 127 |