0


正规化政策评价与改进的线性互补

Linear Complementarity for Regularized Policy Evaluation and Improvement
课程网址: http://videolectures.net/nips2010_johns_lcr/  
主讲教师: Jeff Johns
开课单位: 杜克大学
开课时间: 2011-01-12
课程语种: 英语
中文简介:
最近在强化学习方面的工作强调了L1正则化的功能,以执行特征选择和防止过度拟合。我们提出将L1正则化线性不动点问题公式化为线性互补问题(LCP)。这种配方与LARS灵感配方LARS TD相比具有多项优势。 LCP公式允许使用有效的现成解算器,导致新的唯一性结果,并且可以使用类似问题(热启动)的起点进行初始化。我们证明了热启动以及LCP求解器的效率可以加速策略迭代。此外,热启动允许一种形式的修改的策略迭代,其可用于近似“贪婪”同伦路径,LARS TD同伦路径的概括,其结合了策略评估和优化。
课程简介: Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation, LARS-TD. The LCP formulation allows the use of efficient off-the-shelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modified policy iteration that can be used to approximate a "greedy" homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization.
关 键 词: 强化学习; L1正则化; 线性互补问题
课程来源: 视频讲座网
最后编审: 2019-09-06:lxf
阅读次数: 37