0


聚类过程分析

Analysis of Clustering Procedures
课程网址: http://videolectures.net/mlss09us_dasgupta_acp/  
主讲教师: Sanjoy Dasgupta
开课单位: 加州大学圣地亚哥分校
开课时间: 2009-07-30
课程语种: 英语
中文简介:
集群程序在严格的保证上是众所周知的短。在本教程中,我将介绍一些应用于集群的分析类型,并强调仍然存在的开放性问题。第一部分,聚类的两种常用的成本函数的近似算法是K-中心和K-均值。两者都是NP难以精确优化。(a)近似优化这些成本函数的算法。(b)此类分类的层次版本。(c)当数据以流式或在线方式到达时进行集群。第二部分。流行启发式分析(a)k-均值有多好?有多快?(b)em的概率分析。(c)对于层次聚类,通过聚集启发式方法获得什么近似比?第三部分:聚类中的统计理论:通过从该分布中聚类出有限样本,可以捕捉到基础数据分布的哪些方面?(a)k-均值的一致性。(b)聚类树和链接算法。(c)矢量量化的速率。
课程简介: Clustering procedures are notoriously short on rigorous guarantees. In this tutorial, I will cover some of the types of analysis that have been applied to clustering, and emphasize open problems that remain. Part I. Approximation algorithms for clustering Two popular cost functions for clustering are k-center and k-means. Both are NP-hard to optimize exactly. (a) Algorithms for approximately optimizing these cost functions. (b) Hierarchical versions of such clusterings. (c) Clustering when data is arriving in a streaming or online manner. Part II. Analysis of popular heuristics (a) How good is k-means? How fast is it? (b) Probabilistic analysis of EM. (c) What approximation ratio is achieved by agglomerative heuristics for hierarchial clustering? Part III. Statistical theory in clustering What aspects of the underlying data distribution are captured by the clustering of a finite sample from that distribution? (a) Consistency of k-means. (b) The cluster tree and linkage algorithms. (c) Rates for vector quantization.
关 键 词: 计算机科学; 机器学习; 聚类
课程来源: 视频讲座网
最后编审: 2021-02-03:nkq
阅读次数: 40