近端强化学习:学习在原始-对偶空间中行动Proximal Reinforcement Learning: Learning to Act in Primal Dual Spaces |
|
课程网址: | http://videolectures.net/rldm2015_mahadevan_dual_spaces/ |
主讲教师: | Sridhar Mahadevan |
开课单位: | 马萨诸塞大学阿默斯特分校 |
开课时间: | 2015-07-28 |
课程语种: | 英语 |
中文简介: | 在这次演讲中,我们提出了一个由我们在过去几年中开发的强化学习新框架,该框架为过去三十年中一直未解决的长期基本问题提供了严格的数学解决方案:(i)如何设计“安全”保持在参数空间稳定区域的强化学习算法(ii)如何设计真正的随机梯度时间差分学习算法并给出有限样本界来表征其收敛性?(iii)更广泛地说,如何指定一个灵活的算法框架,简化各种目标函数强化学习算法的设计?在解决这三个问题的过程中,最重要的思想是通过使用“镜像映射”来连接原始对偶空间:Legendre变换优雅地统一和概括了无数过去用于解决强化学习问题的算法,从自然梯度法和指数梯度法到梯度TD法和稀疏RL法。我们介绍镜像下降RL,这是一个强大的RL方法家族,通过不同的勒让德变换使用镜像映射来实现可靠性、可伸缩性和稀疏性。我们的工作广泛建立在过去50年随机优化的进展基础上,从20世纪50年代中期开始的近端映射、单调算子和算子分裂的研究,到求解变分不等式的一阶优化和鞍点超梯度方法的最新进展。 |
课程简介: | In this talk, we set forth a new framework for reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding fundamental questions that have remained unresolved over the past three decades: (i) how to design “safe” reinforcement learning algorithms that remain in a stable region of the parameter space (ii) how to design true stochastic gradient temporal-difference learning algorithms and give finite-sample bounds characterizing their convergence? (iii) more broadly, how to specify a flexible algorithmic framework that simplifies the design of reinforcement learning algorithms for various objective functions? The most important idea that emerges as a motif throughout the solution of these three problems is the use of primal dual spaces connected through the use of “mirror maps”: Legendre transforms that elegantly unify and generalize a myriad past algorithms for solving reinforcement learning problems, from natural gradient actor-critic methods and exponentiated-gradient methods to gradient TD and sparse RL methods. We introduce mirror-descent RL, a powerful family of RL methods that uses mirror maps through different Legendre transforms to achieve reliability, scalability, and sparsity. Our work builds extensively on the past 50 years of advances in stochastic optimization, from the study of proximal mappings, monotone operators, and operator splitting began in the mid-1950s to recent advances in first-order optimization and saddle-point extragradient methods for solving variational inequalities. |
关 键 词: | 强化学习新框架; 对偶空间; 指数梯度法 |
课程来源: | 视频讲座网 |
数据采集: | 2021-11-20:zkj |
最后编审: | 2021-11-20:zkj |
阅读次数: | 76 |