组合半强盗的一阶遗憾界][First-order regret bounds for combinatorial semi-bandits]_MOOC(慕课)境外开放课程

首页 → 工程与技术科学

组合半强盗的一阶遗憾界 First-order regret bounds for combinatorial semi-bandits


课程网址:	https://videolectures.net/videos/colt2015_neu_online_learning
主讲教师:	Gergely Neu
开课单位:	信息不详。欢迎您在右侧留言补充。
开课时间:	2025-02-04
课程语种:	英语
中文简介:	我们考虑半土匪反馈下的在线组合优化问题，其中学习者必须从组合决策集中反复选择动作，以最小化与其决策相关的总损失。在做出每个决定后，学习者会观察到与其行为相关的损失，但不会观察到其他损失。学习者的表现是根据其在$T$轮后对最佳固定动作的总预期遗憾来衡量的。在本文中，我们提出了一种计算效率高的算法，将现有的$O（\sqrt{T}）$的最坏情况保证改进为$O（\sqrt{L_T^}）$，其中$L_T^*$是最佳动作的总损失。我们的算法是第一个在部分反馈方案中实现这种保证的算法，也是第一个在组合设置中实现这一保证的算法。
课程简介:	We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. The performance of the learner is measured in terms of its total expected regret against the best fixed action after $T$ rounds. In this paper, we propose a computationally efficient algorithm that improves existing worst-case guarantees of $O(\sqrt{T})$ to $O(\sqrt{L_T^})$, where $L_T^$ is the total loss of the best action. Our algorithm is among the first to achieve such guarantees in a partial-feedback scheme, and the first one to do so in a combinatorial setting.
关键词:	组合优化问题：总损失; 组合决策
课程来源:	视频讲座网
数据采集:	2025-03-30：zsp
最后编审:	2025-03-31：liyy
阅读次数:	3

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2025 © 浙大宁波理工学院图书馆