0


学习表征:学习理论的挑战

Learning Representations: A Challenge for Learning Theory
课程网址: http://videolectures.net/colt2013_lecun_theory/  
主讲教师: Yann LeCun
开课单位: 纽约大学
开课时间: 2013-08-09
课程语种: 英语
中文简介:
视觉和听觉等感知任务需要构建良好的特性,或者输入的良好的内部表示。深度学习设计了一套由多个可训练模块组成的训练系统自动构造特征层次的有监督和无监督方法,近年来OCR、语音识别和图像分析的发展表明,深度学习系统比依靠手工制作的系统具有更高的精度。只要有更多的培训数据和更多的计算资源可用,就可以使用功能或“浅”体系结构。深度学习系统,特别是卷积网络,在各种基准和竞争中保持着良好的性能记录,其中包括图像中的目标识别、语义图像标记(二维和三维)、语音识别的声学建模、药物设计、手写识别、行人检测、路标识别等。最近由谷歌、IBM、微软、百度、NEC等公司部署的语音识别和图像分析系统都使用了深度学习,很多都使用了卷积网络,而深度学习的实际成功是众多的,围绕它的理论问题也是如此。电路复杂性理论能告诉我们什么关于深层体系结构及其多个连续的计算步骤,相比之下,比如说,具有简单内核且只有两个步骤的内核机器?学习理论能告诉我们什么是无监督特征学习?理论能告诉我们关于由扩展输入维度的层(如稀疏编码)组成的深层架构的特性,然后是减少输入维度的层(如池)的特性吗?关于在深度学习中产生的非凸目标函数的性质,理论能告诉我们什么?为什么表现最好的深度学习系统会被荒谬地过度参数化,而正规化如此激进,以至于近乎于种族灭绝?
课程简介: Perceptual tasks such as vision and audition require the construction of good features, or good internal representations of the input. Deep Learning designates a set of supervised and unsupervised methods to construct feature hierarchies automatically by training systems composed of multiple stages of trainable modules.The recent history of OCR, speech recognition, and image analysis indicates that deep learning systems yield higher accuracy than systems that rely on hand-crafted features or "shallow" architectures whenever more training data and more computational resources become available. Deep learning systems, particularly convolutional nets, hold the performances record in a wide variety of benchmarks and competition, including object recognition in image, semantic image labeling (2D and 3D), acoustic modeling for speech recognition, drug design, handwriting recognition, pedestrian detection, road sign recognition, etc. The most recent speech recognition and image analysis systems deployed by Google, IBM, Microsoft, Baidu, NEC and others all use deep learning and many use convolutional nets.While the practical successes of deep learning are numerous, so are the theoretical questions that surround it. What can circuit complexity theory tell us about deep architectures with their multiple sequential steps of computation, compared to, say, kernel machines with simple kernels that have only two steps? What can learning theory tell us about unsupervised feature learning? What can theory tell us about the properties of deep architectures composed of layers that expand the dimension of their input (e.g. like sparse coding), followed by layers that reduce it (e.g. like pooling)? What can theory tell us about the properties of the non-convex objective functions that arise in deep learning? Why is it that the best-performing deep learning systems happen to be ridiculously over-parameterized with regularization so aggressive that it borders on genocide?
关 键 词: 计算机科学; 机器学习; 数据结构
课程来源: 视频讲座网
最后编审: 2021-01-30:nkq
阅读次数: 125