
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
课程网址: http://videolectures.net/nipsworkshops2011_shamir_convex/  
主讲教师: Ohad Samir
开课单位: 微软
开课时间: 2012-01-25
课程语种: 英语
课程简介: Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T)/T), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T) rate. This might lead one to believe that standard SGD is suboptimal, and maybe should even be replaced as a method of choice. In this paper, we investigate the optimality of SGD in a stochastic setting. We show that for smooth problems, the algorithm attains the optimal O(1/T) rate. However, for non-smooth problems, the convergence rate with averaging might really be (log(T)/T), and this is not just an artifact of the analysis. On the flip side, we show that a simple modification of the averaging step suffices to recover the O(1/T) rate, and no other change of the algorithm is necessary. We also present experimental results which support our findings, and point out open problems.
关 键 词: 随机梯度下降; 优化问题; 收敛速度
课程来源: 视频讲座网
数据采集: 2022-12-16:chenjy
最后编审: 2022-12-16:chenjy
阅读次数: 20