0


基于典型相关分析的多视图聚类

Multiview Clustering via Canonical Correlation Analysis
课程网址: http://videolectures.net/lms08_livescu_mvc/  
主讲教师: Karen Livescu
开课单位: 芝加哥丰田技术学院
开课时间: 2008-12-20
课程语种: 英语
中文简介:
当数据是高维的时,诸如k-means的聚类算法表现不佳。近年来开发的许多有效的聚类算法通过将数据投影到较低维的子空间(例如,低维空间)来解决该问题。在聚类之前,通过主成分分析(PCA)或随机投影。这些技术通常需要对集群装置之间的分离提出严格的要求。在这里,我们介绍了基于投影的聚类的持续工作,该聚类使用多个数据视图来解决这个问题。我们使用典型相关分析(CCA)将每个视图中的数据投影到较低维度的子空间。在相关维度捕获关于聚类身份的信息的假设下,算法成功所需的分离条件明显弱于文献中先前结果的分离条件。我们描述了两个领域的实验,(a)语音音频和发言人面孔的图像,以及(b)维基百科文章中的文本和链接。我们讨论了在这些域中进行聚类时出现的几个问题,特别是存在多个可能的“集群变量”和层次结构集群结构。
课程简介: Clustering algorithms such as k-means perform poorly when the data is highdimensional. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via principal components analysis (PCA) or random projections, before clustering. Such techniques typically require stringent requirements on the separation between the cluster means. Here we present ongoing work on projection-based clustering that addresses this using multiple views of the data. We use canonical correlation analysis (CCA) to project the data in each view to a lower-dimensional subspace. Under the assumption that the correlated dimensions capture the information about the cluster identities, the separation conditions required for the algorithm to be successful are significantly weaker than those of prior results in the literature. We describe experiments on two domains, (a) speech audio and images of the speakers’ faces, and (b) text and links in Wikipedia articles. We discuss several issues that arise when clustering in these domains, in particular the existence of multiple possible “cluster variables” and of a hierarchical cluster structure.
关 键 词: 聚类算法; 主成分分析; 数据视图
课程来源: 视频讲座网
最后编审: 2019-05-15:cjy
阅读次数: 83