boostcluster:升压的成对约束聚类BoostCluster: Boosting Clustering by Pairwise Constraints |
|
课程网址: | http://videolectures.net/kdd07_liu_bc/ |
主讲教师: | Yi Liu |
开课单位: | 南达科他州立大学 |
开课时间: | 2007-09-14 |
课程语种: | 英语 |
中文简介: | 数据聚类是许多学科的重要任务。大量研究试图通过使用通常编码为成对约束的辅助信息来改进聚类。然而,这些研究侧重于设计可以有效利用成对约束的特殊聚类算法。我们提出了一种用于数据聚类的增强框架,称为BoostCluster,它能够通过利用成对约束来迭代地提高任何给定聚类算法的准确性。设计数据聚类的增强框架的关键挑战是如何使用辅助信息影响任意聚类算法,因为根据定义聚类算法是无监督的。所提出的框架通过在每次迭代中动态地生成新的数据表示来解决该问题,所述新的数据表示一方面适合于给定算法在先前迭代处的聚类结果,另一方面与给定的辅助信息一致。我们的实证研究表明,提出的提升框架可以有效地改善一些流行聚类的表现。 算法(Kmeans,分区SingleLink,谱聚类),其性能可与具有辅助信息的数据聚类的最新算法相媲美。 |
课程简介: | Data clustering is an important task in many disciplines. A large number of studies have attempted to improve clustering by using the side information that is often encoded as pairwise constraints. However, these studies focus on designing special clustering algorithms that can effectively exploit the pairwise constraints. We present a boosting framework for data clustering, termed as BoostCluster, that is able to iteratively improve the accuracy of any given clustering algorithm by exploiting the pairwise constraints. The key challenge in designing a boosting framework for data clustering is how to influence an arbitrary clustering algorithm with the side information since clustering algorithms by definition are unsupervised. The proposed framework addresses this problem by dynamically generating new data representations at each iteration that are, on the one hand, adapted to the clustering results at previous iterations by the given algorithm, and on the other hand consistent with the given side information. Our empirical study shows that the proposed boosting framework is effective in improving the performance of a number of popular clustering algorithms (Kmeans, partitional SingleLink, spectral clustering), and its performance is comparable to the state-of-the-art algorithms for data clustering with side information. |
关 键 词: | 数据聚类; 聚类算法; 数据信息 |
课程来源: | 视频讲座网 |
最后编审: | 2020-06-08:吴雨秋(课程编辑志愿者) |
阅读次数: | 48 |