一种基于效用的相对指数加权算法][A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits]_MOOC(慕课)境外开放课程

首页 → 工程与技术科学
首页 → 计算机科学技术

一种基于效用的相对指数加权算法 A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits


课程网址:	http://videolectures.net/icml2015_gajane_dueling_bandits/
主讲教师:	Pratik Gajane
开课单位:	SequeL实验室
开课时间:	2015-09-27
课程语种:	英语
中文简介:	我们研究了K武装决斗强盗问题，这是经典的多武装强盗（MAB）问题的变体，其中学习者只接收关于所选武器对的相对反馈。我们提出了一种新的算法，称为勘探和开发的相对指数权重算法（REX3），以处理该问题的基于对抗性效用的公式。该算法是勘探和开发指数权重算法（EXP3）的非平凡扩展。我们证明了该算法的O阶有限时间预期遗憾上界（sqrt（K ln（K）T））和ω阶一般下界（sqrt）KT）。最后，我们使用来自信息检索应用程序的真实数据提供了实验结果。
课程简介:	We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.
关键词:	多武装强盗; 权重算法; 应用程序
课程来源:	视频讲座网
数据采集:	2022-12-07：chenjy
最后编审:	2022-12-07：chenjy
阅读次数:	55

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2025 © 浙大宁波理工学院图书馆