0


了解部分监控游戏

Toward the understanding of partial-monitoring games
课程网址: http://videolectures.net/explorationexploitation2011_szepesvari_t...  
主讲教师: Csaba Szepesvári
开课单位: 阿尔伯塔大学
开课时间: 2011-07-25
课程语种: 英语
中文简介:
部分监控游戏形成了诸如学习专家建议,多臂强盗问题,动态定价,暗池问题,标签有效预测,认知无线电中的信道分配,具有各种反馈的线性和凸优化等问题的共同点。在部分监控游戏中学习如何努力学习的难度可以通过学习者遗憾的极小极大增长率来表征。众所周知,其中一些游戏更难,而其他一些游戏在学习方面更容易,这取决于收到多少信息的成本:游戏中的某些动作可能会为学习者提供更多信息,但可能会更昂贵虽然有些人可能选择性较差(或没有),但要便宜一些。在设计策略时,应该清楚的是应该考虑游戏的全局信息成本结构。在这次演讲中提出的问题是怎么回事?什么是好的学习策略?也就是说,鉴于游戏的结构,相应的极小极大遗憾和算法实现了什么?在这次演讲中,我将回顾最近回答这个问题的进展,以及一些未解决的问题。
课程简介: Partial monitoring games form a common ground for problems such as learning with expert advice, the multi-armed bandit problem, dynamic pricing, the dark pool problem, label efficient prediction, channel allocation in cognitive radio, linear and convex optimization with various feedbacks. How hard is to learn to play well in a partial monitoring game can be characterized by the minimax growth rate of the learner’s regret. It is well known that some of these games are harder, while some others are easier when it comes to learning, depending on how much information one receives at what cost: Some actions in a game might give more information for the learner but might be pricier, while some might give less selective (or no), but be cheaper. When designing a strategy, it should be clear that one should take into account the global information-cost structure of the game. The question asked in this talk is how? What are the good learning strategies? That is, given the structure of a game, what is the corresponding minimax regret and what algorithm achieves it? In this talk I will review recent progress toward answering this question, as well as some open problems.
关 键 词: 监控游戏; 多臂强盗问题; 信道分配
课程来源: 视频讲座网
最后编审: 2019-04-14:lxf
阅读次数: 57