一种用于极端聚类层次算法An Online Hierarchical Algorithm for Extreme Clustering |
|
课程网址: | http://videolectures.net/kdd2017_kobren_extreme_clustering/ |
主讲教师: | 信息不详。欢迎您在右侧留言补充。 |
开课单位: | 马萨诸塞大学 |
开课时间: | 2017-10-09 |
课程语种: | 英语 |
中文简介: | 许多现代聚类方法可以很好地扩展到大量数据项 N,但不能扩展到大量聚类 K。本文介绍了 PERCH,一种用于在线分层聚类的新非贪婪算法,可扩展到大量 N 和 K ——我们称之为极端聚类的问题设置。我们的算法有效地将新数据点路由到增量构建的树的叶子。出于对准确性和速度的渴望,我们的方法执行树旋转,以提高子树纯度并鼓励平衡。我们证明,在自然可分离性假设下,我们的非贪婪算法将生成具有完美树状图纯度的树,无论在线数据到达顺序如何。 |
课程简介: | Many modern clustering methods scale well to a large number of data items, N, but not to a large number of clusters, K. This paper introduces PERCH, a new non-greedy algorithm for online hierarchical clustering that scales to both massive N and K--a problem setting we term extreme clustering. Our algorithm efficiently routes new data points to the leaves of an incrementally-built tree. Motivated by the desire for both accuracy and speed, our approach performs tree rotations for the sake of enhancing subtree purity and encouraging balancedness. We prove that, under a natural separability assumption, our non-greedy algorithm will produce trees with perfect dendrogram purity regardless of online data arrival order. Our experiments demonstrate that PERCH constructs more accurate trees than other tree-building clustering algorithms and scales well with both N and K, achieving a higher quality clustering than the strongest flat clustering competitor in nearly half the time. |
关 键 词: | 现代聚类方法; 分层聚类; 数据科学 |
课程来源: | 视频讲座网 |
数据采集: | 2023-12-27:wujk |
最后编审: | 2023-12-27:wujk |
阅读次数: | 15 |