0


非参数化策略梯度:命题和关系域的统一处理

Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains
课程网址: http://videolectures.net/icml08_driessens_npp/  
主讲教师: Kurt Driessens
开课单位: 鲁汶大学
开课时间: 2008-08-04
课程语种: 英语
中文简介:
政策梯度方法是学习如何与环境互动的有力工具。现有的方法只侧重于命题域和连续域。如果没有广泛的功能工程, 就很难----如果不是不可能的话----在结构化域中应用它们, 例如, 它们之间有不同数量的对象和关系。在本文中, 我们描述了一种非参数策略梯度方法 (称为 npgp), 它克服了这一限制。关键的想法是应用弗里德曼的梯度提升: 策略表示为在阶段优化中增长的回归模型的加权总和。nppg 采用现成的回归学习者, 可以统一处理命题、连续和关系域。实验结果表明, 该方法甚至可以在既定结果的基础上得到改进。
课程简介: Policy gradient approaches are a powerful instrument for learning how to interact with the environment.Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult -- if not impossible -- to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a non-parametric policy gradient approach -- called NPPG -- that overcomes this limitation. The key idea is to apply Friedmann's gradient boosting: policies are represented as a weighted sum of regression models grown in an stage-wise optimization. Employing off-the-shelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results.
关 键 词: 政策梯度法; 策略梯度方法; 功能梯度
课程来源: 视频讲座网
最后编审: 2020-06-03:张荧(课程编辑志愿者)
阅读次数: 155