0


加强对具体认知的学习

Reinforcement Learning for Embodied Cognition
课程网址: http://videolectures.net/nips2010_ballard_rlec/  
主讲教师: Dana H. Ballard
开课单位: 德克萨斯大学
开课时间: 2011-01-12
课程语种: 英语
中文简介:
用于测量大脑状态的仪器的巨大进步使得解决大脑计算整体模型的大问题成为可能。大脑的内在复杂性可以导致人们忽略与其与身体关系相关的问题,但体验认知领域强调,在系统层面对大脑功能的理解需要人们解决大脑身体界面的作用。虽然很明显大脑通过感官接收其所有输入并通过电机系统引导其输出,但最近才意识到身体界面执行大量计算而不必由大脑重复,并且从而使大脑在其表示中得到极大的简化。实际上,大脑的抽象状态可以明确地或隐含地指代由身体创建的世界的编码表示。即使大脑可以通过抽象与世界通信,其神经回路中的严重速度限制意味着必须进行大量的索引。在开发期间执行,以便可以快速访问适当的行为响应。这可能发生的一种方式是,如果大脑使用某种分解,行为原语可以快速访问和组合。这种因子分解与具体的认知模型具有巨大的协同作用,它可以利用身体在引导行为中所施加的自然过滤来选择相关的原语。这些优点可以通过充满人形化身的虚拟环境来探索。这样的设置允许以系统方式操纵实验参数。我们的测试设置是日常自然环境,例如在小镇上散步和驾驶,以及三明治制作和寻找公寓中丢失的物品。我们关注的问题集中在使用强化学习(RL)对个体行为原语进行编程。中心问题是眼睛固定编程,个人行为模块的信用分配,以及通过反向强化学习来学习行为的价值。眼睛注视是人类使用的中央信息收集方法,但是编程它们的协议仍未解决。我们证明RL设置中的信息增益可以解释实验数据。信用分配。如果要将行为分解为单个模块,那么将收到的奖励分成它们就成了一个主要问题。我们证明了RL设置中使用的贝叶斯估计技术可以有效地解决这个问题。逆强化学习。学习新行为的一种方法是,如果一个人类代理人可以模仿他们并了解他们的价值。我们证明了Rothkopf开发的一种有效算法可以使用贝叶斯RL技术估计观测数据的行为值。
课程简介: The enormous progress in instrumentation for measuring brain states has made it possible to tackle the large issue of an overall model of brain computation. The intrinsic complexity of the brain can lead one to set aside issues related to its relationships with the body, but the field of Embodied Cognition stresses that understanding of brain function at the system level requires one to address the role of the brain-body interface. While it is obvious that the brain receives all its input through the senses and directs its outputs through the motor system, it has only recently been appreciated that the body interface performs huge amounts of computation that does not have to be repeated by the brain, and thus affords the brain great simplifications in its representations. In effect the brain's abstract states can explicitly or implicitly refer to coded representations of the world created by the body. Even if the brain can communicate with the world through abstractions, the severe speed limitations in its neural circuitry means that vast amounts of indexing must be performed during development so that appropriate behavioral responses can be rapidly accessed. One way this could happen would be if the brain used some kind of decomposition whereby behavioral primitives could be quickly accessed and combined. Such a factorization has huge synergies with embodied cognition models, which can use the natural filtering imposed by the body in directing behavior to select relevant primitives. These advantages can be explored with virtual environments replete with humanoid avatars. Such settings allow the manipulation of experimental parameters in systematic ways. Our test settings are those of everyday natural settings such as walking and driving in a small town, and sandwich making and looking for lost items in an apartment. The issues we focus on center around the programming of the individual behavioral primitives using reinforcement learning (RL). Central issues are eye fixation programming, credit assignment to individual behavioral modules, and learning the value of behaviors via inverse reinforcement learning. Eye fixations are the central information gathering method used by humans, yet the protocols for programming them are still unsettled. We show that information gain in an RL setting can potentially explain experimental data. Credit assignment. If behaviors are to be decomposed into individual modules, then dividing up received reward amongst them becomes a major issue. We show that Bayesian estimation techniques, used in the RL setting, resolve this issue efficiently. Inverse Reinforcement Learning. One way to learn new behaviors would be if a human agent could imitate them and learn their value. We show that an efficient algorithm developed by Rothkopf can estimate value of behaviors from observed data using Bayesian RL techniques.
关 键 词: 测量大脑; 仪器; 世界通信
课程来源: 视频讲座网
最后编审: 2020-06-08:yumf
阅读次数: 60