0


带状线性优化的复杂性

On the Complexity of Bandit Linear Optimization
课程网址: https://videolectures.net/videos/colt2015_shamir_linear_optimizat...  
主讲教师: Ohad Shamir
开课单位: 信息不详。欢迎您在右侧留言补充。
开课时间: 2025-02-04
课程语种: 英语
中文简介:
我们研究了带有强盗反馈的在线线性优化问题的可实现遗憾,与全信息设置不同,玩家只能观察到自己的损失,而不是完整的损失向量。我们表明,在这种情况下,强盗信息的价格可以高达$d$,反驳了众所周知的猜想,即强盗线性优化的遗憾最多是全信息遗憾的倍。令人惊讶的是,这是使用标准域的“微不足道”修改来显示的,这在全信息设置中没有影响。我们提出的这一结果和其他结果突出了全信息学习和强盗学习之间的一些有趣的差异,这在以前的文献中没有考虑到。
课程简介: We study the attainable regret for online linear optimization problems with bandit feedback, where unlike the full-information setting, the player can only observe its own loss rather than the full loss vector. We show that the price of bandit information in this setting can be as large as $d$, disproving the well-known conjecture (\cite{dani2007price}) that the regret for bandit linear optimization is at most $\sqrt{d}$ times the full-information regret. Surprisingly, this is shown using ``trivial'' modifications of standard domains, which have no effect in the full-information setting. This and other results we present highlight some interesting differences between full-information and bandit learning, which were not considered in previous literature.
关 键 词: 线性优化; 损失向量; 强盗学习
课程来源: 视频讲座网
数据采集: 2025-03-27:zsp
最后编审: 2025-03-28:liyy
阅读次数: 5