
Generalization and Exploration via Value Function Randomization
课程网址: http://videolectures.net/rldm2015_van_roy_function_randomization/  
主讲教师: Ben Van Roy
开课单位: 斯坦福大学
开课时间: 2015-07-28
课程语种: 英语
课程简介: Effective reinforcement learning calls for both efficient exploration and extrapolative generalization. I will discuss a new approach to exploration which combines the merits of provably efficient tabula rasa reinforcement learning algorithms, such as UCRL and PSRL, and algorithms that accommodate value function generalization, such least-squares value iteration and temporal-difference learning. The former require learning times that grow with the cardinality of the state space, whereas the latter tend to be applied in conjunction with inefficient exploration schemes such as Boltzmann and epsilon-greedy exploration. Our new approach explores through randomization of value function estimates.
关 键 词: 强化学习; 外推推广; 强化学习算法; 适应值函数泛化的算法
