首页地球科学
   首页经济学
   首页社会学
0


主动强化学习

Active Reinforcement Learning
课程网址: http://videolectures.net/icml08_epshteyn_arl/  
主讲教师: Arkady Epshteyn
开课单位: 谷歌公司
开课时间: 2008-08-06
课程语种: 英语
中文简介:
当已知马尔可夫决策过程(MDP)的转移概率和奖励时,代理可以获得最优策略而无需与环境进行任何交互。但是,专家难以确定准确的转换概率。留给代理商的一个选择是对环境进行长期且可能代价高昂的探索。在本文中,我们提出了另一种选择:给定MDP的初始(可能不准确)规范,代理确定最优策略对转换和奖励变化的敏感性。然后,它将探索重点放在最优政策最敏感的空间区域。我们表明,拟议的勘探策略在若干控制和规划问题上表现良好。
课程简介: When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, the agent can obtain the optimal policy without any interaction with the environment. However, exact transition probabilities are difficult for experts to specify. One option left to an agent is a long and potentially costly exploration of the environment. In this paper, we propose another alternative: given initial (possibly inaccurate) specification of the MDP, the agent determines the sensitivity of the optimal policy to changes in transitions and rewards. It then focuses its exploration on the regions of space to which the optimal policy is most sensitive. We show that the proposed exploration strategy performs well on several control and planning problems.
关 键 词: 马尔可夫决策; 转换概率; 代理商
课程来源: 视频讲座网
最后编审: 2019-04-18:cwx
阅读次数: 161