随机梯度下降并行化的无锁方法Lock-Free Approaches to Parallelizing Stochastic Gradient Descent |
|
课程网址: | http://videolectures.net/nipsworkshops2011_recht_lockfree/ |
主讲教师: | Benjamin Recht |
开课单位: | 加州大学伯克利分校 |
开课时间: | 2012-01-25 |
课程语种: | 英语 |
中文简介: | 随机梯度下降(SGD)是一种非常流行的优化算法,用于解决数据驱动的机器学习问题。由于其抗噪声,快速收敛速度和可预测的内存占用,SGD非常适合处理大量数据。然而,SGD似乎受到许多可扩展性的经典障碍的阻碍:(1)SGD本质上是连续的,(2)SGD假设从基础数据集中统一抽样导致局部性差,以及(3)当前的方法并行化SGD需要性能破坏,细粒度的沟通。本演讲旨在驳斥SGD固有地遭受这些障碍的传统观念。具体来说,我将展示SGD可以通过最少的通信并行实现,没有锁定或同步,并具有强大的空间局部性。我将提供理论和实验证据,证明多核工作站在几个基准优化问题上实现线性加速。最后,我将结束讨论由我们的实现引发的具有挑战性的问题,这些问题涉及矩阵的算术和几何平均值。 |
课程简介: | Stochastic Gradient Descent (SGD) is a very popular optimization algorithm for solving data-driven machine learning problems. SGD is well suited to processing large amounts of data due to its robustness against noise, rapid convergence rates, and predictable memory footprint. Nevertheless, SGD seems to be impeded by many of the classical barriers to scalability: (1) SGD appears to be inherently sequential, (2) SGD assumes uniform sampling from the underlying data set resulting in poor locality, and (3) current approaches to parallelize SGD require performance-destroying, fine-grained communication. This talk aims to refute the conventional wisdom that SGD inherently suffers from these impediments. Specifically, I will show that SGD can be implemented in parallel with minimal communication, with no locking or synchronization, and with strong spatial locality. I will provide both theoretical and experimental evidence demonstrating the achievement of linear speedups on multicore workstations on several benchmark optimization problems. Finally, I will close with a discussion of a challenging problem raised by our implementations relating arithmetic and geometric means of matrices. |
关 键 词: | 随机梯度下降; 优化算法; 机器学习 |
课程来源: | 视频讲座网 |
最后编审: | 2020-05-31:王勇彬(课程编辑志愿者) |
阅读次数: | 108 |