0


校准土匪的公平性

Calibrated fairness in bandits
课程网址: http://videolectures.net/kdd2017_radanovic_bandits/  
主讲教师: Goran Radanovic
开课单位: 哈佛大学工程与应用科学学院
开课时间: 2017-12-01
课程语种: 英语
中文简介:
我们在随机多臂强盗(MAB)决策框架下研究公平性。我们将“对相似个体一视同仁”的公平框架调整为[5]。在这里,一个“个体”对应一只手臂,如果两只手臂具有相似的质量分布,则它们是“相似的”。首先,我们采用平滑性约束,即如果两个臂具有相似的质量分布,则选择每个臂的概率应该相似。此外,我们定义了公平性遗憾,它对应于算法未校准的程度,其中完美校准要求选择手臂的概率等于手臂具有最佳质量实现的概率。我们证明了汤普森抽样上的一个变异满足总变异距离的光滑公平性,并给出了公平性遗憾的O ~ ((kT) 2/3)界。这补充了先前的工作[12],它可以保护一般较好的手臂不受冷落。我们还解释了如何将我们的算法扩展到决斗强盗设置。
课程简介: We study fairness within the stochastic, multi-armed bandit (MAB) decision making framework. We adapt the fairness framework of “treating similar individuals similarly” [5] to this setting. Here, an ‘individual’ corresponds to an arm and two arms are ‘similar’ if they have a similar quality distribution. First, we adopt a smoothness constraint that if two arms have a similar quality distribution then the probability of selecting each arm should be similar. In addition, we define the fairness regret, which corresponds to the degree to which an algorithm is not calibrated, where perfect calibration requires that the probability of selecting an arm is equal to the probability with which the arm has the best quality realization. We show that a variation on Thompson sampling satisfies smooth fairness for total variation distance, and give an O˜((kT ) 2/3 ) bound on fairness regret. This complements prior work [12], which protects an on-average better arm from being less favored. We also explain how to extend our algorithm to the dueling bandit settiing.
关 键 词: 校准公平; 公平框架; 多臂强盗; 质量分布
课程来源: 视频讲座网
数据采集: 2023-04-22:chenxin01
最后编审: 2023-05-18:chenxin01
阅读次数: 34