0


在随机环境下部分监控游戏的极小极大遗憾

Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments
课程网址: http://videolectures.net/colt2011_bartok_games/  
主讲教师: Gábor Bartók
开课单位: 苏黎世联邦理工学院
开课时间: 2011-08-02
课程语种: 英语
中文简介:
在部分监控游戏中,学习者反复选择动作,环境响应结果,然后学习者遭受损失并接收反馈信号,这两者都是动作的固定功能和结果。学习者的目标是尽量减少他的后悔,这是他的总累积损失与事后最佳固定行动的总损失之间的差异。假设结果是在i.i.d中生成的。从任意和未知的概率分布出发,我们用有限的许多行为和结果来描述任何部分监控游戏的极小极大遗憾。事实证明,任何此类游戏的极小极大遗憾都是零,Ɵ(√T),Ɵ(T2 / 3)或Ɵ(T)。我们提供了一种计算有效的学习算法,可以在任何游戏的对数因子内实现极小极大的遗憾。
课程简介: In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary and unknown probability distribution, we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero,Ɵ(√T), Ɵ(T2/3) or Ɵ(T). We provide a computationally efficient learning algorithm that achieves the minimax regret within logarithmic factor for any game.
关 键 词: 概率; 部分监控游戏; 学习算法
课程来源: 视频讲座网
最后编审: 2020-07-16:yumf
阅读次数: 64