非参数化策略梯度：命题和关系域的统一处理][Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains]_MOOC(慕课)境外开放课程

首页 → 计算机科学技术

非参数化策略梯度：命题和关系域的统一处理 Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains


课程网址:	http://videolectures.net/icml08_driessens_npp/
主讲教师:	Kurt Driessens
开课单位:	鲁汶大学
开课时间:	2008-08-04
课程语种:	英语
中文简介:	政策梯度方法是学习如何与环境互动的有力工具。现有的方法只侧重于命题域和连续域。如果没有广泛的功能工程, 就很难----如果不是不可能的话----在结构化域中应用它们, 例如, 它们之间有不同数量的对象和关系。在本文中, 我们描述了一种非参数策略梯度方法 (称为 npgp), 它克服了这一限制。关键的想法是应用弗里德曼的梯度提升: 策略表示为在阶段优化中增长的回归模型的加权总和。nppg 采用现成的回归学习者, 可以统一处理命题、连续和关系域。实验结果表明, 该方法甚至可以在既定结果的基础上得到改进。
课程简介:	Policy gradient approaches are a powerful instrument for learning how to interact with the environment.Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult -- if not impossible -- to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a non-parametric policy gradient approach -- called NPPG -- that overcomes this limitation. The key idea is to apply Friedmann's gradient boosting: policies are represented as a weighted sum of regression models grown in an stage-wise optimization. Employing off-the-shelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results.
关键词:	政策梯度法; 策略梯度方法; 功能梯度
课程来源:	视频讲座网
最后编审:	2020-06-03：张荧(课程编辑志愿者)
阅读次数:	181

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2025 © 浙大宁波理工学院图书馆