一种基于效用的相对指数加权算法A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits |
|
课程网址: | http://videolectures.net/icml2015_gajane_dueling_bandits/ |
主讲教师: | Pratik Gajane |
开课单位: | SequeL实验室 |
开课时间: | 2015-09-27 |
课程语种: | 英语 |
中文简介: | 我们研究了K武装决斗强盗问题,这是经典的多武装强盗(MAB)问题的变体,其中学习者只接收关于所选武器对的相对反馈。我们提出了一种新的算法,称为勘探和开发的相对指数权重算法(REX3),以处理该问题的基于对抗性效用的公式。该算法是勘探和开发指数权重算法(EXP3)的非平凡扩展。我们证明了该算法的O阶有限时间预期遗憾上界(sqrt(K ln(K)T))和ω阶一般下界(sqrt)KT)。最后,我们使用来自信息检索应用程序的真实数据提供了实验结果。 |
课程简介: | We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications. |
关 键 词: | 多武装强盗; 权重算法; 应用程序 |
课程来源: | 视频讲座网 |
数据采集: | 2022-12-07:chenjy |
最后编审: | 2022-12-07:chenjy |
阅读次数: | 38 |