0


基于典型相关分析的多视图聚类

Multi-View Clustering via Canonical Correlation Analysis
课程网址: http://videolectures.net/icml09_chaudhuri_mvc/  
主讲教师: Kamalika Chaudhuri
开课单位: 加州大学圣地亚哥分校
开课时间: 2009-08-26
课程语种: 英语
中文简介:
高维聚类数据被认为是一般的难题。近年来开发的多种有效聚类算法解决了将数据投影到较低维数子空间的问题,例如,在聚类之前,通过PrincipalComponents Analysis(PCA)或随机预测。在这里,我们考虑通过Canonical CorrelationAnalysis(CCA)使用数据的多个视图来构建这样的投影。在假设视图与集群标签不相关的情况下,我们表明算法成功所需的分离条件明显弱于先前的结果。我们提供了高斯混合物和对数凹分布混合的结果。我们还提供来自视听说话人聚类(我们希望聚类对应tospeaker ID)和分层维基文档聚类(其中一个视图是文档中的词,另一个是链接结构)的经验支持。
课程简介: Clustering data in high dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA). Under the assumption that the views are uncorrelated given the cluster label, we show that the separation conditions required for the algorithm to be successful are significantly weaker than prior results in the literature. We provide results for mixtures of Gaussians and mixtures of log concave distributions. We also provide empirical support from audio-visual speaker clustering (where we desire the clusters to correspond to speaker ID) and from hierarchical Wikipedia document clustering (where one view is the words in the document and the other is the link structure).
关 键 词: 高维聚类数据; 有效聚类算法; 高斯混合物
课程来源: 视频讲座网
最后编审: 2019-04-21:lxf
阅读次数: 245