首页数学
0


马尔可夫决策过程:基于偏好的参考点

Markov Decision Processes with Ordinal Rewards: Reference Point-Based Preferences
课程网址: http://videolectures.net/icaps2011_weng_preferences/  
主讲教师: Paul Weng
开课单位: 巴黎第六大学
开课时间: 2011-07-11
课程语种: 英语
中文简介:
在标准马尔可夫决策过程(MDP)中,假设奖励是精确已知的并且具有定量性质。在某些情况下,这可能是一个过于强烈的假设。当奖励真的可以用数字建模时,指定奖励功能通常很困难,因为它是一种认知要求高和/或耗时的任务。此外,奖励有时可能具有定性性质,例如当它们代表定性风险等级时。在这些情况下,直接使用标准MDP是有问题的,我们建议采用具有序数奖励的MDP。假设只知道奖励的总订单。在此设置中,我们将解释如何利用参考点来定义表达和可解释偏好的替代方法。
课程简介: In a standard Markov decision process (MDP), rewards are assumed to be precisely known and of quantitative nature. This can be a too strong hypothesis in some situations. When rewards can really be modeled numerically, specifying the reward function is often difficult as it is a cognitively-demanding and/or time-consuming task. Besides, rewards can sometimes be of qualitative nature as when they represent qualitative risk levels for instance. In those cases, it is problematic to use directly standard MDPs and we propose instead to resort to MDPs with ordinal rewards. Only a total order over rewards is assumed to be known. In this setting, we explain how an alternative way to define expressive and interpretable preferences using reference points can be exploited.
关 键 词: 马尔可夫决策; MDP; 数值模拟
课程来源: 视频讲座网
最后编审: 2020-06-29:yumf
阅读次数: 80