0


未知因素动态下基于非策略模型的学习

Off-policy Model-based Learning under Unknown Factored Dynamics
课程网址: http://videolectures.net/icml2015_hallak_unknown_factored_dynamic...  
主讲教师: Assaf Hallak
开课单位: 以色列理工学院
开课时间: 2015-09-27
课程语种: 英语
中文简介:
动态决策问题中的非策略学习对于提供强有力的证据证明新策略比正在使用的策略更好是至关重要的。但我们如何在不考验新政策的情况下证明优越性呢?为了回答这个问题,我们引入G-SCOPE算法,该算法基于现有策略生成的数据评估新策略。我们的算法在计算和样本效率方面都很好,因为它在动态环境中贪婪地学习利用因子结构。我们对该方法进行了有限样本分析,并通过实验证明该算法在样本较少的高维问题上具有良好的扩展性。
课程简介: Off-policy learning in dynamic decision problems is essential for providing strong evidence that a new policy is better than the one in use. But how can we prove superiority without testing the new policy? To answer this question, we introduce the G-SCOPE algorithm that evaluates a new policy based on data generated by the existing policy. Our algorithm is both computationally and sample efficient because it greedily learns to exploit factored structure in the dynamics of the environment. We present a finite sample analysis of our approach and show through experiments that the algorithm scales well on high-dimensional problems with few samples.
关 键 词: 动态决策; 数据评估; 高维问题
课程来源: 视频讲座网
数据采集: 2022-11-06:chenjy
最后编审: 2022-11-06:chenjy
阅读次数: 27