在罕见事件的存在下强化学习][Reinforcement Learning in the Presence of Rare Events]_MOOC(慕课)境外开放课程

   首页 → 政治制度
   首页 → 应用数学
   首页 → 概率论

在罕见事件的存在下强化学习 Reinforcement Learning in the Presence of Rare Events


课程网址:	http://videolectures.net/icml08_frank_rlp/
主讲教师:	Jordan Frank
开课单位:	麦吉尔大学
开课时间:	2008-08-04
课程语种:	英语
中文简介:	我们考虑在一个罕见的重大事件发生的环境中强化学习的任务，而这些事件独立于控制代理选择的操作。如果根据这些事件发生的自然概率对其进行采样，则标准的强化学习算法的收敛速度可能非常慢，并且学习算法可能表现出很高的方差。在这项工作中，我们假设我们可以使用模拟器，在模拟器中，罕见的事件概率可以被人为地改变。然后，利用该仿真数据进行重要性抽样学习。我们介绍了策略评估的算法，包括使用值函数的表格和函数近似表示。我们证明了在这两种情况下，强化学习算法是收敛的。在表格中，我们还分析了我们的方法与TD学习相比的偏差和方差。我们从经验上评估了该算法在随机马尔可夫决策过程以及大型网络规划任务中的性能。
课程简介:	We consider the task of reinforcement learning in an environment in which rare significant events occur independently of the actions selected by the controlling agent. If these events are sampled according to their natural probability of occurring, convergence of standard reinforcement learning algorithms is likely to be very slow, and the learning algorithms may exhibit high variance. In this work, we assume that we have access to a simulator, in which the rare event probabilities can be artificially altered. Then, importance sampling can be used to learn with this simulation data. We introduce algorithms for policy evaluation, both using tabular and function approximation representation of the value function. We prove that in both cases, the reinforcement learning algorithms converge. In the tabular case, we also analyze the bias and variance of our approach compared to TD-learning. We evaluate empirically the performance of the algorithm on random Markov Decision Processes, as well as on a large network planning task.
关键词:	强化学习; 自然概率; 政策评估; 马尔可夫决策
课程来源:	视频讲座网
最后编审:	2019-12-06：lxf
阅读次数:	75

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2025 © 浙大宁波理工学院图书馆