首页生物学
   首页社会学
   首页统计学
0


自学聚类

Self-taught Clustering
课程网址: http://videolectures.net/icml08_dai_stc/  
主讲教师: Wenyuan Dai
开课单位: 上海交通大学
开课时间: 2008-08-04
课程语种: 英语
中文简介:
本文重点介绍一种新的聚类任务,称为自学聚类。自学聚类是无监督转移学习的一个实例,其目的是在大量辅助未标记数据的帮助下聚集一小部分目标未标记数据。主题分布中的目标和辅助数据可以不同。我们表明,即使当目标数据不足以允许有效学习高质量特征表示时,也可以借助于可以有效地聚类目标数据的辅助数据来学习有用的特征。我们提出了一种基于协同聚类的自学聚类算法来解决这个问题,通过同时聚类目标和辅助数据,允许来自辅助数据的特征表示通过一组共同的特征来影响目标数据。在新数据表示下,可以改进对目标数据的聚类。我们的图像聚类实验表明,当使用不相关的无标签辅助数据时,我们的算法可以大大优于几种最先进的聚类方法。
课程简介: This paper focuses on a new clustering task, called self-taught clustering. Self-taught clustering is an instance of unsupervised transfer learning, which aims at clustering a small collection of target unlabeled data with the help of a large amount of auxiliary unlabeled data. The target and auxiliary data can be different in topic distribution. We show that even when the target data are not sufficient to allow effective learning of a high quality feature representation, it is possible to learn the useful features with the help of the auxiliary data on which the target data can be clustered effectively. We propose a co-clustering based self-taught clustering algorithm to tackle this problem, by clustering the target and auxiliary data simultaneously to allow the feature representation from the auxiliary data to influence the target data through a common set of features. Under the new data representation, clustering on the target data can be improved. Our experiments on image clustering show that our algorithm can greatly outperform several state-of-the-art clustering methods when utilizing irrelevant unlabeled auxiliary data.
关 键 词: 自学聚类; 无监督; 转移学习
课程来源: 视频讲座网
最后编审: 2019-04-18:cwx
阅读次数: 101