0


K均值聚类调整合适的措施

Adapting the Right Measures for K-Means Clustering
课程网址: http://videolectures.net/kdd09_wu_atrm/  
主讲教师: Junjie Wu
开课单位: 俄勒冈州立大学
开课时间: 2009-09-14
课程语种: 英语
中文简介:
聚类验证是聚类文献中的长期挑战。虽然已经开发了许多用于评估聚类算法性能的验证措施,但这些措施通常提供有关聚类性能的不一致信息,并且在实践中使用的最佳合适措施仍然未知。因此,本文通过对K-means聚类的16个外部验证测量进行有组织的研究,填补了这一关键空白。具体来说,我们首先介绍了度量标准化在评估具有不平衡类分布的数据的聚类性能中的重要性。我们还为多种措施提供标准化解决方案。此外,我们总结了这些外部措施的主要特性。这些属性可以作为在不同应用场景中选择验证度量的指导。最后,我们揭示了这些外部措施之间的相互关系。通过数学变换,我们证明了一些验证措施是等价的。此外,一些措施具有一致的验证性能。最重要的是,我们提供了一个指导方针,为K-means聚类选择最合适的验证方法。
课程简介: Clustering validation is a long standing challenge in the clustering literature. While many validation measures have been developed for evaluating the performance of clustering algorithms, these measures often provide inconsistent information about the clustering performance and the best suitable measures to use in practice remain unknown. This paper thus fills this crucial void by giving an organized study of 16 external validation measures for K-means clustering. Specifically, we first introduce the importance of measure normalization in the evaluation of the clustering performance on data with imbalanced class distributions. We also provide normalization solutions for several measures. In addition, we summarize the major properties of these external measures. These properties can serve as the guidance for the selection of validation measures in different application scenarios. Finally, we reveal the interrelationships among these external measures. By mathematical transformation, we show that some validation measures are equivalent. Also, some measures have consistent validation performances. Most importantly, we provide a guide line to select the most suitable validation measures for K-means clustering.
关 键 词: 聚类文学; 聚类算法; 聚类性能的评价
课程来源: 视频讲座网
最后编审: 2020-06-29:zyk
阅读次数: 57