0


减少模仿学习和结构化预测的没遗憾的在线学习

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
课程网址: http://videolectures.net/aistats2011_ross_reduction/  
主讲教师: Stephane Ross
开课单位: 卡内基梅隆大学
开课时间: 2011-05-06
课程语种: 英语
中文简介:
连续预测问题,如模仿学习,未来的观察依赖于以前的预测(行动),违反了统计学习中常见的i.i.d.假设。这导致在理论上和实践中表现不佳。最近的一些方法(DauméIII等,2009;Ross和Bagnell, 2010)在这种情况下提供了更强的保证,但仍然有些令人不满意,因为他们训练了非平稳或随机策略,并且需要大量的迭代。本文提出了一种新的迭代算法,该算法训练了一种平稳确定性策略,可以看作是在线学习环境下的无悔算法。我们表明,任何这样的无悔算法,结合额外的约简假设,必须找到一个在这种顺序设置中所诱导的观测分布下具有良好性能的策略。我们证明这种新方法在两个具有挑战性的模仿学习问题和一个基准序列标记问题上优于以前的方法。
课程简介: Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches (Daumé III et al., 2009; Ross and Bagnell, 2010) provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
关 键 词: 模仿学习; 结构化预测
课程来源: 视频讲座网
最后编审: 2020-09-24:dingaq
阅读次数: 82