逆强化学习中奖励估计的主动学习Active Learning for Reward Estimation in Inverse Reinforcement Learning |
|
课程网址: | https://videolectures.net/videos/ecmlpkdd09_melo_alreirl |
主讲教师: | Francisco S. Melo |
开课单位: | 信息不详。欢迎您在右侧留言补充。 |
开课时间: | 2009-10-20 |
课程语种: | 英语 |
中文简介: | 逆强化学习解决了从专家/演示者提供的策略样本中恢复奖励函数的一般问题。本文将主动学习引入到逆强化学习中。我们提出了一种算法,该算法允许代理向演示者查询特定状态下的样本,而不是仅仅依赖于“任意”状态下提供的样本。我们算法的目的是在减少专家所需的策略样本数量的同时,以与文献中其他方法相似的精度估计奖励函数。我们还讨论了我们的算法在高维问题中的应用,使用蒙特卡罗和梯度方法。我们在几个不同复杂性的模拟示例中给出了我们的算法的说明性结果。 |
课程简介: | Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at ”arbitrary” states. The purpose of our algorithm is to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. We also discuss the use of our algorithm in higher dimensional problems, using both Monte Carlo and gradient methods. We present illustrative results of our algorithm in several simulated examples of different complexities. |
关 键 词: | 蒙特卡罗; 梯度方法; 逆强化学习 |
课程来源: | 视频讲座网 |
数据采集: | 2025-04-21:zsp |
最后编审: | 2025-04-21:zsp |
阅读次数: | 2 |