组合半强盗的一阶遗憾界First-order regret bounds for combinatorial semi-bandits |
|
课程网址: | https://videolectures.net/videos/colt2015_neu_online_learning |
主讲教师: | Gergely Neu |
开课单位: | 信息不详。欢迎您在右侧留言补充。 |
开课时间: | 2025-02-04 |
课程语种: | 英语 |
中文简介: | 我们考虑半土匪反馈下的在线组合优化问题,其中学习者必须从组合决策集中反复选择动作,以最小化与其决策相关的总损失。在做出每个决定后,学习者会观察到与其行为相关的损失,但不会观察到其他损失。学习者的表现是根据其在$T$轮后对最佳固定动作的总预期遗憾来衡量的。在本文中,我们提出了一种计算效率高的算法,将现有的$O(\sqrt{T})$的最坏情况保证改进为$O(\sqrt{L_T^*})$,其中$L_T^**$是最佳动作的总损失。我们的算法是第一个在部分反馈方案中实现这种保证的算法,也是第一个在组合设置中实现这一保证的算法。 |
课程简介: | We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. The performance of the learner is measured in terms of its total expected regret against the best fixed action after $T$ rounds. In this paper, we propose a computationally efficient algorithm that improves existing worst-case guarantees of $O(\sqrt{T})$ to $O(\sqrt{L_T^*})$, where $L_T^*$ is the total loss of the best action. Our algorithm is among the first to achieve such guarantees in a partial-feedback scheme, and the first one to do so in a combinatorial setting. |
关 键 词: | 组合优化问题:总损失; 组合决策 |
课程来源: | 视频讲座网 |
数据采集: | 2025-03-30:zsp |
最后编审: | 2025-03-31:liyy |
阅读次数: | 3 |