首页数学
0


偏好诱导和逆强化学习

Preference elicitation and inverse reinforcement learning
课程网址: http://videolectures.net/ecmlpkdd2011_rothkopf_elicitation/  
主讲教师: Constantin A. Rothkopf
开课单位: 法兰克福高等研究所
开课时间: 2011-11-30
课程语种: 英语
中文简介:
我们从偏好激发的角度来陈述逆强化学习的问题, 从而形成一个有原则的 (贝叶斯) 统计公式。这概括了以往关于贝叶斯逆增强学习的工作, 并允许我们获得后分布的代理的喜好, 政策和可选, 获得的奖励序列, 从观察。通过分析和实验结果, 考察了所得方法与逆增强学习的其他统计方法的关系。我们表明, 即使观察到的代理的策略相对于其自身的首选项而言, 也可以准确地确定首选项。在这种情况下, 与其他方法和所证明的策略的性能相比, 可以获得与代理偏好有关的显著改进的策略。
课程简介: We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent's preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relation of the resulting approach to other statistical methods for inverse reinforcement learning via analysis and experimental results. We show that preferences can be determined accurately, even if the observed agent's policy is sub-optimal with respect to its own preferences. In that case, significantly improved policies with respect to the agent's preferences are obtained, compared to both other methods and to the performance of the demonstrated policy.
关 键 词: (贝叶斯)统计公式; 学习逆强化; 政策绩效
课程来源: 视频讲座网
最后编审: 2020-05-18:王淑红(课程编辑志愿者)
阅读次数: 1218