首页数学
   首页函数论
   首页计算机应用
0


更多数据更少工作:运行时作为单调减少数据集大小的功能

More Data Less Work: Runtime As A Monotonically Decreasing Function of Data Set Size
课程网址: http://videolectures.net/mlss09us_srebro_mdlwrmdfdss/  
主讲教师: Nathan Srebro
开课单位: 芝加哥丰田技术学院
开课时间: 2009-07-30
课程语种: 英语
中文简介:
我们习惯于将运行时作为数据集大小的递增函数来研究,并且当这种增加不那么糟糕时(例如,当运行时线性增加,或者甚至是多项式,数据集大小增加时),我们都很高兴。传统的学习运行时分析也以这种方式查看,并研究培训运行时如何随着更多数据的增加而增加。然而,考虑到培训的真正目标,即获得一个好的预测器,我认为训练运行时实际上应该作为训练集大小的*递减*函数来研究。专注于培训支持向量机(SVM),并结合优化,统计学习理论和在线方法的思想。然后,我将展示理论和实证结果,证明一个简单的随机次梯度下降方法确实如何显示出这种单调递减行为。我还将在高斯混合聚类的背景下讨论类似的现象,其中看起来多余的数据将问题从计算难以处理转变为计算易处理的问题。与Shai Shalev-Shwartz,Karthik Sridharan,Yoram Singer,Greg Shakhnarovich和Sam Roweis共同合作。
课程简介: We are used to studying runtime as an increasing function of the data set size, and are happy when this increase is not so bad (e.g. when the runtime increases linearly, or even polynomiall, with the data set size). Traditional runtime analysis of learning is also viewed this way, and studies how training runtime increases as more data is available. However, considering the true objective of training, which is to obtain a good predictor, I will argue that training runtime should actually be studied as a *decreasing* function of training set size. Focusing on training Support Vector Machines (SVMs) and combining ideas from optimization, statistical learning theory, and online methods. I will then present both theoretical and empirical results demonstrating how a simple stochastic subgradient descent approach indeed displays such monotonic decreasing behavior. I will also discuss a similar phenomena in the context of Gaussian mixture clustering, where it appears that excess data turns the problem from computationally intractable to computationally tractable. Joint work with Shai Shalev-Shwartz, Karthik Sridharan, Yoram Singer, Greg Shakhnarovich and Sam Roweis.
关 键 词: 数据集; 递增函数; 随机次梯度下降方法
课程来源: 芝加哥丰田技术学院
最后编审: 2019-08-09:cjy
阅读次数: 72