汤普森抽样:一个可证明的强盗问题的良好贝叶斯启发式Thompson Sampling: a provably good Bayesian heuristic for bandit problems |
|
课程网址: | http://videolectures.net/lsoldm2013_agrawal_thompson_sampling/ |
主讲教师: | Shipra Agrawal |
开课单位: | 微软印度研究院 |
开课时间: | 2013-11-07 |
课程语种: | 英语 |
中文简介: | 多武装土匪问题是管理在许多情况下出现的勘探/开采权衡的基本模型。汤普森抽样 [汤普森 1933]是最早用于多臂强盗问题的启发式方法之一,由于其优雅、灵活、高效和有前景的经验表现,最近引起了人们的极大兴趣。在这次演讲中,我将讨论最近的结果,这些结果表明汤普森抽样对几种流行的多臂强盗问题(包括线性上下文强盗)给出了接近最优的遗憾。有趣的是,这些研究为贝叶斯启发式提供了一种无先验的频率型分析,从而为直觉提供了严格的支持,即一旦你获得了足够的数据,无论你从什么先验开始,因为你的后验都足够准确。 |
课程简介: | Multi-armed bandit problem is a basic model for managing the exploration/exploitation trade-off that arises in many situations. Thompson Sampling [Thompson 1933] is one of the earliest heuristic for the multi-armed bandit problem, which has recently seen a surge of interest due to its elegance, flexibility, efficiency, and promising empirical performance. In this talk, I will discuss recent results showing that Thompson Sampling gives near-optimal regret for several popular variants of the multi-armed bandit problem, including linear contextual bandits. Interestingly, these works provide a prior-free frequentist type analysis of a Bayesian heuristic, and thereby a rigorous support for the intuition that once you acquire enough data, it doesn't matter what prior you started from because your posterior will be accurate enough. |
关 键 词: | 土匪问题; 基本模型; 多臂强盗 |
课程来源: | 视频讲座网 |
数据采集: | 2023-05-15:chenxin01 |
最后编审: | 2023-05-18:chenxin01 |
阅读次数: | 38 |