0


机器学习的随机逼近算法的非渐近分析

Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning
课程网址: http://videolectures.net/nips2011_bach_machine/  
主讲教师: Francis R. Bach
开课单位: INRIA研究机构
开课时间: 2012-01-19
课程语种: 英语
中文简介:
我们考虑在希尔伯特空间上定义的凸目标函数的最小化,这只能通过对其梯度的无偏估计获得。该问题包括核逻辑回归和最小二乘回归等标准机器学习算法,在运筹学界常被称为随机逼近问题。我们提供了两种著名算法的收敛性的非渐近分析,即随机梯度下降(a.k.a.~罗宾斯-蒙罗算法)以及迭代平均的简单修改(a.k.a.~ Polyak-Ruppert平均)。我们的分析表明,与迭代次数的倒数成正比的学习速率,虽然在强凸情况下会导致最优的收敛速率,但对于缺乏强凸性或比例常数的设置不具有鲁棒性。这种情况是补救时,使用较慢的衰减加上平均,有力地导致最佳收敛速度。我们通过对合成和标准数据集的模拟来说明我们的理论结果。
课程简介: We consider the minimization of a convex objective function defined on a Hilbert space, which is only available through unbiased estimates of its gradients. This problem includes standard machine learning algorithms such as kernel logistic regression and least-squares regression, and is commonly referred to as a stochastic approximation problem in the operations research community. We provide a non-asymptotic analysis of the convergence of two well-known algorithms, stochastic gradient descent (a.k.a.~Robbins-Monro algorithm) as well as a simple modification where iterates are averaged (a.k.a.~Polyak-Ruppert averaging). Our analysis suggests that a learning rate proportional to the inverse of the number of iterations, while leading to the optimal convergence rate in the strongly convex case, is not robust to the lack of strong convexity or the setting of the proportionality constant. This situation is remedied when using slower decays together with averaging, robustly leading to the optimal rate of convergence. We illustrate our theoretical results with simulations on synthetic and standard datasets.
关 键 词: 优化方法; 随机优化; 计算机科学; 机器学习
课程来源: 视频讲座网
最后编审: 2020-07-29:yumf
阅读次数: 84