首页物理学
0


快速近似谱聚类

Fast Approximate Spectral Clustering
课程网址: http://videolectures.net/kdd09_yan_fasc/  
主讲教师: Donghui Yan
开课单位: 加州大学伯克利分校
开课时间: 2009-09-14
课程语种: 英语
中文简介:
谱聚类是指一类灵活的聚类过程,它可以在小数据集上产生高质量的聚类,但由于其一般的O(n3)的计算复杂性而对大规模问题的适用性有限,n数据的数量点。我们通过开发快速近似谱聚类的一般框架来扩展谱聚类的范围,其中首先将失真最小化的局部变换应用于数据。该框架基于理论分析,该分析提供局部失真对误聚类率的影响的统计表征。我们开发了一般框架的两个具体实例,一个基于局部k均值聚类(KASP),另一个基于随机投影树(RASP)。大量实验表明,这些算法可以实现显着的加速,而聚类精度几乎没有降低。具体来说,我们的算法在准确度方面表现优于k-means,并且运行速度比基于Nystrom方法的近似谱聚类快几倍,具有可比较的精度和显着更小的内存占用。值得注意的是,我们的算法使单台机器能够在几分钟内对数百万个观测值进行光谱聚类数据集。
课程简介: Spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small data sets but which has limited applicability to large-scale problems due to its computational complexity of O(n3) in general, with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the mis-clustering rate. We develop two concrete instances of our general framework, one based on local k-means clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform k-means by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nystrom method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes.
关 键 词: 谱聚类; 局部畸变; 局部k-均值聚类; 随机投影树
课程来源: 视频讲座网
最后编审: 2020-06-01:吴雨秋(课程编辑志愿者)
阅读次数: 251