强凸随机优化的梯度下降最优化][Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization]_MOOC(慕课)境外开放课程

首页 → 工程与技术科学
首页 → 计算机科学技术

强凸随机优化的梯度下降最优化 Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization


课程网址:	http://videolectures.net/nipsworkshops2011_shamir_convex/
主讲教师:	Ohad Samir
开课单位:	微软
开课时间:	2012-01-25
课程语种:	英语
中文简介:	随机梯度下降(SGD)是解决机器学习中随机优化问题的一种简单而流行的方法。对于强凸问题，通过运行SGD进行T次迭代并返回平均点，其收敛速度已知为O(log(T)/T)。然而，最近的结果表明，使用不同的算法，可以获得最佳的O(1/T)速率。这可能会导致人们认为标准SGD是次优的，甚至应该取代它作为一种选择方法。在本文中，我们研究了随机环境下SGD的最优性。结果表明，对于光滑问题，该算法能获得最优的O(1/T)速率。然而，对于非光滑问题，平均的收敛速度可能真的是(log(T)/T)，这不仅仅是分析的产物。另一方面，我们证明了对平均步骤的简单修改足以恢复O(1/T)速率，并且不需要对算法进行其他更改。我们还提出了实验结果，支持我们的发现，并指出了开放的问题。
课程简介:	Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T)/T), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T) rate. This might lead one to believe that standard SGD is suboptimal, and maybe should even be replaced as a method of choice. In this paper, we investigate the optimality of SGD in a stochastic setting. We show that for smooth problems, the algorithm attains the optimal O(1/T) rate. However, for non-smooth problems, the convergence rate with averaging might really be (log(T)/T), and this is not just an artifact of the analysis. On the flip side, we show that a simple modification of the averaging step suffices to recover the O(1/T) rate, and no other change of the algorithm is necessary. We also present experimental results which support our findings, and point out open problems.
关键词:	随机梯度下降; 优化问题; 收敛速度
课程来源:	视频讲座网
数据采集:	2022-12-16：chenjy
最后编审:	2022-12-16：chenjy
阅读次数:	44

服务热线：0574-88229129
电子邮件：info_lib@nbt.edu.cn
信息服务：图书馆305室
系统研发：图书馆303室

图书馆学生服务群：437507696
图书馆教工服务群：1038697975
QQ在线咨询
2013-2025 © 浙大宁波理工学院图书馆