马尔可夫决策过程的在线规划乐观原则The optimistic principle for online planning in Markov decision processes |
|
课程网址: | http://videolectures.net/onlinelearning2012_munos_optimistic_prin... |
主讲教师: | Rémi Munos |
开课单位: | 法国国家信息与自动化研究所 |
开课时间: | 2013-05-28 |
课程语种: | 英语 |
中文简介: | 给定初始状态,给定有限数值预算的计划算法可以返回的最佳可能动作是什么(例如,对状态转换和奖励函数的模型的调用次数)。我们调查乐观策略,并在计划问题复杂性的新度量方面提供遗憾的界限。 |
课程简介: | Given an initial state, what is the best possible action that can be returned by a planning algorithm that is given a finite numerical budget (e.g. number of calls to a model of the state-transition and reward functions). We investigate optimistic strategies and provide regret bounds in terms of a new measure of the complexity of the planning problem. |
关 键 词: | 在线学习; 决策支持; 算法 |
课程来源: | 视频讲座网 |
最后编审: | 2020-07-17:yumf |
阅读次数: | 85 |