首页计算机科学技术

一种非线性替代聚类发现层次信息理论技术

A Hierarchical Information Theoretic Technique for the Discovery of Non Linear Alternative Clusterings
课程网址: http://videolectures.net/kdd2010_bailey_hitt/  
主讲教师: James Bailey
开课单位: 墨尔本大学
开课时间: 2010-10-01
课程语种: 英语
中文简介:
发现替代聚类是探索复杂数据集的重要方法。它为用户提供了从不同角度查看聚类行为的能力,从而探索新的假设。然而,当前用于替代聚类的算法主要关注于线性场景,并且可能不如包含具有非线性形状的聚类的数据集所期望的那样执行。我们在本文中的目标是解决非线性的这一挑战。特别地,我们提出了一种新算法来揭示与现有参考聚类明显不同的替代聚类。我们的技术基于信息理论,旨在通过最大化聚类标签和数据观察之间的互信息来确保替代聚类质量,同时通过最小化两个聚类之间的信息共享来确保替代聚类独特性。我们进行实验以评估我们的方法对文献中的大量替代聚类算法。我们展示了我们的技术的性能通常对于非线性场景更好,而且即使对于更简单的线性场景也具有很强的竞争力。
课程简介: Discovery of alternative clusterings is an important method for exploring complex datasets. It provides the capability for the user to view clustering behaviour from different perspectives and thus explore new hypotheses. However, current algorithms for alternative clustering have focused mainly on linear scenarios and may not perform as desired for datasets containing clusters with non linear shapes. Our goal in this paper is to address this challenge of non linearity. In particular, we propose a novel algorithm to uncover an alternative clustering that is distinctively different from an existing, reference clustering. Our technique is information theory based and aims to ensure alternative clustering quality by maximizing the mutual information between clustering labels and data observations, whilst at the same time ensuring alternative clustering distinctiveness by minimizing the information sharing between the two clusterings. We perform experiments to assess our method against a large range of alternative clustering algorithms in the literature. We show our technique's performance is generally better for non-linear scenarios and furthermore, is highly competitive even for simpler, linear scenarios.
关 键 词: 替代聚类; 非线性形状; 聚类
课程来源: 视频讲座网
最后编审: 2020-06-28:yumf
阅读次数: 87