0


提高组合的MRFs聚类稳定性

Improving Clustering Stability with Combinatorial MRFs
课程网址: http://videolectures.net/kdd09_bekkerman_icswc/  
主讲教师: Ron Bekkerman
开课单位: 海法大学
开课时间: 2009-09-14
课程语种: 英语
中文简介:
由于聚类方法通常对参数调整很敏感,因此在聚类结果中获得稳定性是一项重要任务。在这项工作中,我们的目标是通过尝试减少算法不一致的影响并增强来自数据的信号来提高聚类稳定性。我们提出了一种机制,它将m个集群作为输入并输出具有可比质量的$ m $集群,这些集群彼此之间的协议更高。我们称我们的方法为聚类协议过程(CAP)。为了保持群集质量,CAP使用与群集中使用的相同的优化过程。特别是,我们研究了随机聚类方法的稳定性问题(通常在每次运行时产生不同的结果)。我们关注的是基于简单拓扑的组合马尔可夫随机场(或简称Comraf)中的推理的方法。我们将CAP实例化为更复杂的二分Comraf中的推理。我们在四个数据集上测试结果系统,其中三个是中等大小的文本集合,而第四个是大型用户/电影数据集。首先,在所有四种情况下,我们的系统显着提高了根据宏观平均Jaccard指数测量的聚类稳定性。其次,在所有四种情况下,我们的系统也成功地显着提高了聚类质量,实现了最先进的结果。第三,我们的系统显着提高了建立在随机聚类解决方案之上的共识聚类的稳定性。
课程简介: As clustering methods are often sensitive to parameter tuning, obtaining stability in clustering results is an important task. In this work, we aim at improving clustering stability by attempting to diminish the influence of algorithmic inconsistencies and enhance the signal that comes from the data. We propose a mechanism that takes m clusterings as input and outputs $m$ clusterings of comparable quality, which are in higher agreement with each other. We call our method the Clustering Agreement Process (CAP). To preserve the clustering quality, CAP uses the same optimization procedure as used in clustering. In particular, we study the stability problem of randomized clustering methods (which usually produce different results at each run). We focus on methods that are based on inference in a combinatorial Markov Random Field (or Comraf, for short) of a simple topology. We instantiate CAP as inference within a more complex, bipartite Comraf. We test the resulting system on four datasets, three of which are medium-sized text collections, while the fourth is a large-scale user/movie dataset. First, in all the four cases, our system significantly improves the clustering stability measured in terms of the macro-averaged Jaccard index. Second, in all the four cases our system managed to significantly improve clustering quality as well, achieving the state-of-the-art results. Third, our system significantly improves stability of consensus clustering built on top of the randomized clustering solutions.
关 键 词: 聚类协议的过程; 随机聚类方法; 共识聚类
课程来源: 视频讲座网
最后编审: 2020-05-31:吴雨秋(课程编辑志愿者)
阅读次数: 49