0


马尔可夫决策过程的近最佳Chernoff界

Near Optimal Chernoff Bounds for Markov Decision Processes
课程网址: http://videolectures.net/machine_mihai_moldovan_processes/  
主讲教师: Teodor Mihai Moldovan
开课单位: 加州大学伯克利分校
开课时间: 2013-01-14
课程语种: 英语
中文简介:
预期回报是在不确定性下决策的一个广泛使用的目标。已经提出了许多算法,例如值迭代,以对其进行优化。但是,在风险感知设置中,预期回报通常不是优化的合适目标。我们提出了一个新的风险意识规划优化目标,并表明它具有理想的理论属性。我们还绘制了与之前提出的风险意识规划目标的联系:minmax,指数效用,百分位和均值减去方差。我们的方法适用于扩展的马尔可夫决策过程:我们允许成本是随机的,只要它们是有界的。此外,我们提出了一种有效的算法来优化提出的目标。综合和现实世界的实验大规模地说明了我们的方法的有效性。
课程简介: The expected return is a widely used objective in decision making under uncertainty. Many algorithms, such as value iteration, have been proposed to optimize it. In risk-aware settings, however, the expected return is often not an appropriate objective to optimize. We propose a new optimization objective for risk-aware planning and show that it has desirable theoretical properties. We also draw connections to previously proposed objectives for risk-aware planing: minmax, exponential utility, percentile and mean minus variance. Our method applies to an extended class of Markov decision processes: we allow costs to be stochastic as long as they are bounded. Additionally, we present an efficient algorithm for optimizing the proposed objective. Synthetic and real-world experiments illustrate the effectiveness of our method, at scale.
关 键 词: 预期回报; 风险意识规划; 马尔可夫决策
课程来源: 视频讲座网
最后编审: 2019-05-15:lxf
阅读次数: 57