0


一种基于效用的相对指数加权算法

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
课程网址: http://videolectures.net/icml2015_gajane_dueling_bandits/  
主讲教师: Pratik Gajane
开课单位: SequeL实验室
开课时间: 2015-09-27
课程语种: 英语
中文简介:
我们研究了K武装决斗强盗问题,这是经典的多武装强盗(MAB)问题的变体,其中学习者只接收关于所选武器对的相对反馈。我们提出了一种新的算法,称为勘探和开发的相对指数权重算法(REX3),以处理该问题的基于对抗性效用的公式。该算法是勘探和开发指数权重算法(EXP3)的非平凡扩展。我们证明了该算法的O阶有限时间预期遗憾上界(sqrt(K ln(K)T))和ω阶一般下界(sqrt)KT)。最后,我们使用来自信息检索应用程序的真实数据提供了实验结果。
课程简介: We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.
关 键 词: 多武装强盗; 权重算法; 应用程序
课程来源: 视频讲座网
数据采集: 2022-12-07:chenjy
最后编审: 2022-12-07:chenjy
阅读次数: 38