0


聚类流形

Co-Clustering on Manifolds
课程网址: http://videolectures.net/kdd09_zhou_ccom/  
主讲教师: Jie Zhou
开课单位: 清华大学
开课时间: 2009-09-14
课程语种: 英语
中文简介:
共聚类基于数据点(例如文档)和特征(例如单词)之间的二元性,即数据点可以基于它们在特征上的分布来分组,而特征可以基于它们在数据点上的分布来分组。在过去的十年中,已经提出了几种共聚类算法,并且显示出优于传统的单侧聚类。然而,现有的共聚类算法未能考虑数据中的几何结构,这对于在流形上聚类数据是必不可少的。为了解决这个问题,本文提出了一种基于半非负矩阵三分解的双正则化协同聚类(DRCC)方法。我们认为,不仅数据点,而且从一些流形,即数据流形和特征流形分别采样特征。因此,我们构建了两个图,即数据图和特征图,以探索数据流形和特征流形的几何结构。然后我们的共聚类方法被公式化为具有两个图形正则化器的半非负矩阵三分解,要求数据点的聚类标签相对于数据流形是平滑的,而特征的聚类标签相对于数据流形是平滑的。特征流形。我们将证明DRCC可以通过交替最小化来解决,并且它的收敛在理论上是有保证的。在许多基准数据集上进行聚类的实验表明,所提出的方法优于许多现有技术的聚类方法。
课程简介: Co-clustering is based on the duality between data points (e.g. documents) and features (e.g. words), i.e. data points can be grouped based on their distribution on features, while features can be grouped based on their distribution on the data points. In the past decade, several co-clustering algorithms have been proposed and shown to be superior to traditional one-side clustering. However, existing co-clustering algorithms fail to consider the geometric structure in the data, which is essential for clustering data on manifold. To address this problem, in this paper, we propose a Dual Regularized Co-Clustering (DRCC) method based on semi-nonnegative matrix tri-factorization. We deem that not only the data points, but also the features are sampled from some manifolds, namely data manifold and feature manifold respectively. As a result, we construct two graphs, i.e. data graph and feature graph, to explore the geometric structure of data manifold and feature manifold. Then our co-clustering method is formulated as semi-nonnegative matrix tri-factorization with two graph regularizers, requiring that the cluster labels of data points are smooth with respect to the data manifold, while the cluster labels of features are smooth with respect to the feature manifold. We will show that DRCC can be solved via alternating minimization, and its convergence is theoretically guaranteed. Experiments of clustering on many benchmark data sets demonstrate that the proposed method outperforms many state of the art clustering methods.
关 键 词: 聚类; 数据流形; 聚类标签
课程来源: 视频讲座网
最后编审: 2020-06-01:wuyq
阅读次数: 308