规范化非政策TD学习Regularized Off-Policy TD-Learning |
|
课程网址: | http://videolectures.net/machine_liu_learning/ |
主讲教师: | Bo Liu |
开课单位: | 马萨诸塞大学 |
开课时间: | 2013-06-14 |
课程语种: | 英语 |
中文简介: | 我们提出了一种新颖的l1正则化关闭策略收敛TD学习方法(称为RO TD),它能够以低计算复杂度学习值函数的稀疏表示。 RO TD的算法框架集成了两个关键思想:关闭策略收敛梯度TD方法,如TDC,以及非平滑凸优化的凸凹陷鞍点公式,它使得一阶求解器和使用在线凸正则化的特征选择成为可能。提出了RO TD的详细理论和实验分析。通过各种实验来说明RO TD算法的离线策略收敛,稀疏特征选择能力和低计算成本。 |
课程简介: | We present a novel l1 regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, which enables first-order solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of RO-TD is presented. A variety of experiments are presented to illustrate the off-policy convergence, sparse feature selection capability and low computational cost of the RO-TD algorithm. |
关 键 词: | 正则化; 特征选择; 关闭策略 |
课程来源: | 视频讲座网 |
最后编审: | 2020-07-13:yumf |
阅读次数: | 74 |