最小二乘时变差分学习中的正则化与特征选择][Regularization and Feature Selection in Least Squares Temporal-Difference Learning]_MOOC(慕课)境外开放课程

   首页 → 函数论
   首页 → 数学分析
   首页 → 应用数学

最小二乘时变差分学习中的正则化与特征选择 Regularization and Feature Selection in Least Squares Temporal-Difference Learning


课程网址:	http://videolectures.net/icml09_kolter_rfsl/
主讲教师:	J. Zico Kolter
开课单位:	卡内基梅隆大学
开课时间:	2009-09-17
课程语种:	英语
中文简介:	我们用线性值函数逼近来考虑强化学习的任务。时间差分算法，特别是最小二乘时间差分（LSTD）算法，提供了用于学习值函数的参数的方法，但是当特征的数量很大时，该算法可能过度拟合数据并且计算上昂贵。在本文中，我们提出了一个克服这些困难的LSTD算法的正则化框架。特别地，我们关注l1正则化的情况，其对于不相关的特征是鲁棒的并且还用作特征选择的方法。尽管l1正则化LSTD解决方案不能表示为凸优化问题，但我们提出了一种类似于最小角度回归（LARS）算法的算法，该算法可以有效地计算最优解。最后，我们通过实验证明了算法的性能。
课程简介:	We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (LSTD) algorithm, provide a method for learning the parameters of the value function, but when the number of features is large this algorithm can over-fit to the data and is computationally expensive. In this paper, we propose a regularization framework for the LSTD algorithm that overcomes these difficulties. In particular, we focus on the case of l1 regularization, which is robust to irrelevant features and also serves as a method for feature selection. Although the l1 regularized LSTD solution cannot be expressed as a convex optimization problem, we present an algorithm similar to the Least Angle Regression (LARS) algorithm that can efficiently compute the optimal solution. Finally, we demonstrate the performance of the algorithm experimentally.
关键词:	线性值函数; 最小二乘时间差分算法; 正则化框架
课程来源:	视频讲座网
最后编审:	2020-06-22：chenxin
阅读次数:	134

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2025 © 浙大宁波理工学院图书馆