0


无模型政策评估的半参数统计方法

A Semi-parametric Statistical Approach to Model-free Policy Evaluation
课程网址: http://videolectures.net/icml08_ueno_sps/  
主讲教师: Tsuyoshi Ueno
开课单位: 京都大学
开课时间: 2008-08-12
课程语种: 日语
中文简介:
最近开发了基于最小二乘时间差(LSTD)的强化学习(RL)方法,并且已经显示出良好的实际性能。然而,他们的估计质量尚未得到很好的阐明。在本文中,我们从半参数统计推断的新观点讨论基于LSTD的政策评估。实际上,估计器可以从特定的估计函数获得,该函数保证其渐近地收敛到真值,而不指定环境模型。基于这些观察,我们1)分析基于LSTD的估计的渐近方差,2)导出具有最小渐近估计方差的最优估计函数,以及3)导出次优估计以减少获得最优估计函数的计算负担。 。
课程简介: Reinforcement learning (RL) methods based on least-squares temporal difference (LSTD) have been developed recently and have shown good practical performance. However, the quality of their estimation has not been well elucidated. In this article, we discuss LSTD based policy evaluation from the new viewpoint of semiparametric statistical inference. In fact, the estimator can be obtained from a particular estimating function which guarantees its convergence to the true value asymptotically, without specifying a model of the environment. Based on these observations, we 1) analyze the asymptotic variance of an LSTD-based estimator, 2) derive the optimal estimating function with the minimum asymptotic estimation variance, and 3) derive a suboptimal estimator to reduce the computational burden in obtaining the optimal estimating function.
关 键 词: 最小二乘时间差; 半参数统计; 估计器
课程来源: 视频讲座网
最后编审: 2019-04-21:lxf
阅读次数: 73