首页数学
0


强调时间差异学习的收敛性研究

On Convergence of Emphatic Temporal-Difference Learning
课程网址: https://videolectures.net/videos/colt2015_yu_difference_learning  
主讲教师: Huizhen Yu
开课单位: 信息不详。欢迎您在右侧留言补充。
开课时间: 2025-02-04
课程语种: 英语
中文简介:
我们考虑了有限空间下折扣马尔可夫决策过程中策略评估的强调时间差分学习算法。这种算法最近由Sutton, Mahmood和White提出,作为一种改进的线性函数近似解离策略时间差学习的散度问题。本文首次给出了两个重点算法ETD($\lambda$)和ELSTD($\lambda$)的收敛性证明。证明了在一般非策略条件下,ELSTD($\lambda$)在$L^1$上的迭代收敛性和两种算法计算的近似值函数的几乎肯定收敛性。我们的分析涉及新技术,其应用超越了强调算法,例如,首次证明标准TD($\lambda$)在足够大的$\lambda$的非策略训练下也收敛。
课程简介: We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood, and White as an improved solution to the problem of divergence of off-policy temporal-difference learning with linear function approximation. We present in this paper the first convergence proofs for two emphatic algorithms, ETD($\lambda$) and ELSTD($\lambda$). We prove, under general off-policy conditions, the convergence in $L^1$ for ELSTD($\lambda$) iterates and the almost sure convergence of the approximate value functions calculated by both algorithms. Our analysis involves new techniques with applications beyond emphatic algorithms leading, for example, to the first proof that standard TD($\lambda$) also converges under off-policy training for $\lambda$ sufficiently large.
关 键 词: 时间差学习算法; 线性函数; 散度问题
课程来源: 视频讲座网
数据采集: 2025-03-28:zsp
最后编审: 2025-03-28:zsp
阅读次数: 6