0


价值函数随机化的推广与探索

Generalization and Exploration via Value Function Randomization
课程网址: http://videolectures.net/rldm2015_van_roy_function_randomization/  
主讲教师: Ben Van Roy
开课单位: 斯坦福大学
开课时间: 2015-07-28
课程语种: 英语
中文简介:
有效的强化学习需要有效的探索和外推推广。我将讨论一种新的探索方法,它结合了可证明有效的表格强化学习算法(如UCRL和PSRL)和适应值函数泛化的算法(如最小二乘值迭代和时间差分学习)的优点。前者需要随着状态空间基数的增加而增长的学习时间,而后者往往与低效的探索方案(如Boltzmann和epsilon贪婪探索)结合使用。我们的新方法通过价值函数估计的随机化进行探索。
课程简介: Effective reinforcement learning calls for both efficient exploration and extrapolative generalization. I will discuss a new approach to exploration which combines the merits of provably efficient tabula rasa reinforcement learning algorithms, such as UCRL and PSRL, and algorithms that accommodate value function generalization, such least-squares value iteration and temporal-difference learning. The former require learning times that grow with the cardinality of the state space, whereas the latter tend to be applied in conjunction with inefficient exploration schemes such as Boltzmann and epsilon-greedy exploration. Our new approach explores through randomization of value function estimates.
关 键 词: 强化学习; 外推推广; 强化学习算法; 适应值函数泛化的算法
课程来源: 视频讲座网
数据采集: 2021-11-26:zkj
最后编审: 2021-11-26:zkj
阅读次数: 46