
Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback
课程网址: http://videolectures.net/aistats2011_saha_guarantees/  
主讲教师: Ankan Saha
开课单位: 芝加哥大学
开课时间: 信息不详。欢迎您在右侧留言补充。
课程语种: 英语
课程简介: The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identities of the loss functions themselves are not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on selfconcordant regularizers and one-point gradient estimation, we give the rst algorithm whose expected regret is O(T2=3), ignoring constant and logarithmic factors.
关 键 词: 土匪设置; 在线凸优化问题; 梯度估计
课程来源: 视频讲座网
最后编审: 2019-11-16:cwx
阅读次数: 22