0


附近的一个有限部分监控比赛的对抗对手的优化算法

A near-optimal algorithm for finite partial-monitoring games against adversarial opponents
课程网址: http://videolectures.net/colt2013_bartok_opponents/  
主讲教师: Gábor Bartók
开课单位: 苏黎世联邦理工学院
开课时间: 2013-08-09
课程语种: 英语
中文简介:
局部监控是一种在线学习模式, 在每个时间步骤中, 在学习者和对手选择自己的行为后, 对学习者的损失和反馈都是根据损失和反馈函数计算的, 这两者都是学习者在之前知道的我。与其他在线学习场景一样, 学习者的目标是最大限度地减少累积损失。本文提出并分析了一种新的局部监控游戏的算法。我们证明了我们的算法的预期遗憾是 o ~ (√ n ′ t), 其中 t 是时间范围, n ′ 是最大点局部博弈的大小。与之前的结果相比, 这个约束最重要的改进是, 它并不直接取决于动作的数量, 而是取决于游戏的结构。
课程简介: Partial monitoring is an online learning model where in every time step, after a learner and an opponent choose their actions, the loss and the feedback for the learner is calculated based on a loss and a feedback function, both of which are known to the learner ahead of time. As in other online learning scenarios, the goal of the learner is to minimize his cumulative loss. In this paper we present and analyze a new algorithm for locally observable partial monitoring games. We prove that the expected regret of our algorithm is of O~(√N′T), where T is the time horizon and N′ is the size of the largest point-local game. The most important improvement of this bound compared to previous results is that it does not depend directly on the number of actions, but rather on the structure of the game.
关 键 词: 计算机科学; 机器学习; 优化算法
课程来源: 视频讲座网
最后编审: 2020-06-03:张荧(课程编辑志愿者)
阅读次数: 37