0


加速随机梯度下降

Speeding Up Stochastic Gradient Descent
课程网址: http://videolectures.net/eml07_bengio_ssg/  
主讲教师: Yoshua Bengio
开课单位: 蒙特利尔大学
开课时间: 2007-12-29
课程语种: 英语
中文简介:
为了解决大规模学习问题,其解决方案必然涉及具有许多可调参数的大型模型,必须有效地执行困难的非凸优化。计算复杂性参数强烈表明,深层体系结构将是表示AI所涉及的复杂函数的必要条件。不幸的是,这涉及困难的优化问题,并且有效的近似迭代优化成为获得良好推广的关键,而不是在过去二十年中已经如此充分研究的正则化技术。此外,由于这些任务中涉及的数据集的大小,计算规模必须不超过训练样本数量的线性。在许多情况下,要击败的算法是随机梯度下降,并且必须通过查看测试误差与计算时间的曲线来进行比较。随着最近对在线版本的二阶优化方法的兴趣,我们提出了计算技巧,产生了自然梯度优化的线性时间变量。在多层神经网络的优化中特别难以解决的另一个问题是如何有效地并行化。 SMP机器变得更便宜和更容易使用,我们比较和讨论利用多层神经网络训练的并行化的不同策略,表明天真的方法失败但考虑到通信瓶颈的那些产生了令人印象深刻的加速。
课程简介: n order to tackle large-scale learning problems whose solution necessarily involves a large model with many tunable parameters, difficult non-convex optimization has to be performed efficiently. Computational complexity arguments strongly suggest that deep architectures will be necessary to represent the kind of complex functions that AI involves. Unfortunately, this involves difficult optimization problems and efficient approximate iterative optimization becomes key to obtain good generalization, and not so much the regularization techniques that have been so well studied in the last two decades. Furthermore, because of the size of the data sets involved in such tasks, it is imperative that computation scale no more than linearly with respect to the number of training examples. In many cases, the algorithm to beat is stochastic gradient descent, and the comparisons have to be made by looking at the curve of test error versus computation time. Following recent interest in online versions of second-order optimization methods, we present computational tricks that yield a linear time variant of natural gradient optimization. Another issue, that is particularly difficult to address in the optimization of multi-layer neural networks, is how to parallelize efficiently. SMP machines becoming cheaper and easier to use, we compare and discuss different strategies for exploiting parallelization of training for multi-layer neural networks, showing that naive approaches fail but those taking into account the communication bottleneck yield impressive speed-ups.
关 键 词: 大型模型; 迭代优化; 随机梯度
课程来源: 视频讲座网
最后编审: 2019-04-10:lxf
阅读次数: 77