0


使用线性规划的学徒学习

Apprenticeship Learning Using Linear Programming
课程网址: http://videolectures.net/icml08_syed_alu/  
主讲教师: Umar Syed
开课单位: 宾夕法尼亚大学
开课时间: 2008-08-12
课程语种: 英语
中文简介:
在学徒学习中,目标是在马尔可夫决策过程中学习至少与专家演示的策略相同的策略。困难在于,假设MDP的真实奖励功能未知。我们展示了如何将学徒学习作为一个线性规划问题来框架,并展示了使用现成的LP解算器来解决这个问题会大大改善现有方法的运行时间——在我们的实验中,速度快了两个数量级。此外,我们的方法产生固定政策,而所有现有的学徒学习输出政策的方法都是混合的,即固定政策的随机组合。所使用的技术足够通用,可以将任何混合策略转换为固定策略。
课程简介: In apprenticeship learning, the goal is to learn a policy in a Markov decision process that is at least as good as a policy demonstrated by an expert. The difficulty arises in that the MDP's true reward function is assumed to be unknown. We show how to frame apprenticeship learning as a linear programming problem, and show that using an off-the-shelf LP solver to solve this problem results in a substantial improvement in running time over existing methods --- up to two orders of magnitude faster in our experiments. Additionally, our approach produces stationary policies, while all existing methods for apprenticeship learning output policies that are "mixed", i.e. randomized combinations of stationary policies. The technique used is general enough to convert any mixed policy to a stationary policy.
关 键 词: 学徒学习; 马尔可夫决策; 线性规划; 固定政策
课程来源: 视频讲座网
最后编审: 2019-12-07:lxf
阅读次数: 48