实践RL:表现、互动、综合和道德(棱镜)Practical RL: Representation, interaction, synthesis, and morality (PRISM) |
|
课程网址: | http://videolectures.net/rldm2015_stone_practical_rl/ |
主讲教师: | Peter Stone |
开课单位: | 德克萨斯大学 |
开课时间: | 2015-07-28 |
课程语种: | 英语 |
中文简介: | 在将强化学习扩展到具有不完全表示和层次结构的大的连续域时,我们经常尝试使用在小的有限域中被证明收敛的算法,然后只希望得到最好的结果。本次演讲将倡导设计符合约束条件的算法,并真正利用问题可能带来的机会。根据UT Austin学习代理研究小组内的几个不同的研究线索,我将触及由这些约束和机会引起的四类问题:1)表示-选择问题表示的算法,并调整表示以适应算法;2) 与其他主体和人类训练者的互动;3)针对同一问题和同一算法中不同概念的不同算法的综合;4)死亡率——处理这样一个约束:当环境相对于可用的行动机会的数量较大时,人们无法穷尽地探索。在此背景下,我将重点介绍两种特定的RL方法,即用于机器人实时样本高效强化学习的TEXPLORE算法;以及分层学习,一种分层机器学习范式,通过增量学习一系列子行为来实现复杂行为的学习。TEXPLORE已在全尺寸全自主机器人车上实施和测试,分层学习是我们2014年RoboCup 3D模拟联赛冠军赛的关键决定因素。 |
课程简介: | When scaling up Reinforcement Learning (RL) to large continuous domains with imperfect representations and hierarchical structure, we often try applying algorithm that are proven to converge in small finite domains, and then just hope for the best. This talk will advocate instead designing algorithms that adhere to the constraints, and indeed take advantage of the opportunities, that might come with the problem at hand. Drawing on several different research threads within the Learning Agents Research Group at UT Austin, I will touch on four types of issues that arise from these constraints and opportunities: 1) Representation -choosing the algorithm for the problem’s representation and adapting the representation to fit the algorithm; 2) Interaction - with other agents and with human trainers; 3) Synthesis - of different algorithms for the same problem and of different concepts in the same algorithm; and 4) Mortality - dealing with the constraint that when the environment is large relative to the number of action opportunities available, one cannot explore exhaustively. Within this context, I will focus on two specific RL approaches, namely the TEXPLORE algorithm for real-time sample-efficient reinforcement learning for robots; and layered learning, a hierarchical machine learning paradigm that enables learning of complex behaviors by incrementally learning a series of sub-behaviors. TEXPLORE has been implemented and tested on a full-size fully autonomous robot car, and layered learning was the key deciding factor in our RoboCup 2014 3D simulation league championship. |
关 键 词: | 表现; 互动; 综合 |
课程来源: | 视频讲座网 |
数据采集: | 2021-01-04:yxd |
最后编审: | 2021-01-08:yumf |
阅读次数: | 27 |