非参数化策略梯度:命题和关系域的统一处理Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains |
|
课程网址: | http://videolectures.net/icml08_driessens_npp/ |
主讲教师: | Kurt Driessens |
开课单位: | 鲁汶大学 |
开课时间: | 2008-08-04 |
课程语种: | 英语 |
中文简介: | 政策梯度方法是学习如何与环境互动的有力工具。现有的方法只侧重于命题域和连续域。如果没有广泛的功能工程, 就很难----如果不是不可能的话----在结构化域中应用它们, 例如, 它们之间有不同数量的对象和关系。在本文中, 我们描述了一种非参数策略梯度方法 (称为 npgp), 它克服了这一限制。关键的想法是应用弗里德曼的梯度提升: 策略表示为在阶段优化中增长的回归模型的加权总和。nppg 采用现成的回归学习者, 可以统一处理命题、连续和关系域。实验结果表明, 该方法甚至可以在既定结果的基础上得到改进。 |
课程简介: | Policy gradient approaches are a powerful instrument for learning how to interact with the environment.Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult -- if not impossible -- to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a non-parametric policy gradient approach -- called NPPG -- that overcomes this limitation. The key idea is to apply Friedmann's gradient boosting: policies are represented as a weighted sum of regression models grown in an stage-wise optimization. Employing off-the-shelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results. |
关 键 词: | 政策梯度法; 策略梯度方法; 功能梯度 |
课程来源: | 视频讲座网 |
最后编审: | 2020-06-03:张荧(课程编辑志愿者) |
阅读次数: | 155 |