0


实践RL:表现、互动、综合与道德(PRISM)

Practical RL: Representation, interaction, synthesis, and morality (PRISM)
课程网址: http://videolectures.net/rldm2015_stone_practical_rl/  
主讲教师: Peter Stone
开课单位: 德克萨斯大学
开课时间: 2015-07-28
课程语种: 英语
中文简介:
在将强化学习(RL)扩展到具有不完全表示和层次结构的连续大区域时,我们经常尝试应用被证明在小的有限域内收敛的算法,然后只希望得到最好的结果。本次演讲将提倡设计符合约束条件的算法,并利用可能出现的问题带来的机会。借鉴奥斯汀大学学习代理研究小组的几个不同的研究思路,我将讨论由这些限制和机会引起的四类问题:1)表示——选择问题表示的算法并调整表示以适应算法;2) 交互-与其他代理和人类训练员;3)合成-针对同一问题的不同算法和同一算法中的不同概念;以及4)死亡率-处理限制条件,即当环境相对于可用的行动机会数量很大时,无法进行详尽的探索。在此背景下,我将重点介绍两种特定的RL方法,即用于机器人实时样本有效强化学习的TEXPLORE算法;以及分层学习,一种分层的机器学习范式,通过递增学习一系列子行为来学习复杂行为。TEXPLORE已经在全尺寸全自主机器人汽车上实现和测试,分层学习是我们RoboCup 2014 3D模拟联盟锦标赛的关键决定因素。
课程简介: When scaling up Reinforcement Learning (RL) to large continuous domains with imperfect representations and hierarchical structure, we often try applying algorithm that are proven to converge in small finite domains, and then just hope for the best. This talk will advocate instead designing algorithms that adhere to the constraints, and indeed take advantage of the opportunities, that might come with the problem at hand. Drawing on several different research threads within the Learning Agents Research Group at UT Austin, I will touch on four types of issues that arise from these constraints and opportunities: 1) Representation -choosing the algorithm for the problem’s representation and adapting the representation to fit the algorithm; 2) Interaction - with other agents and with human trainers; 3) Synthesis - of different algorithms for the same problem and of different concepts in the same algorithm; and 4) Mortality - dealing with the constraint that when the environment is large relative to the number of action opportunities available, one cannot explore exhaustively. Within this context, I will focus on two specific RL approaches, namely the TEXPLORE algorithm for real-time sample-efficient reinforcement learning for robots; and layered learning, a hierarchical machine learning paradigm that enables learning of complex behaviors by incrementally learning a series of sub-behaviors. TEXPLORE has been implemented and tested on a full-size fully autonomous robot car, and layered learning was the key deciding factor in our RoboCup 2014 3D simulation league championship.
关 键 词: 表现; 互动; 道德
课程来源: 视频讲座网
数据采集: 2020-11-26:yxd
最后编审: 2020-12-15:cjy
阅读次数: 45