0


持久性聚类

Persistence-based Clustering
课程网址: http://videolectures.net/solomon_skraba_pbc/  
主讲教师: Primož Škraba
开课单位: 约瑟夫·斯特凡学院
开课时间: 2010-03-26
课程语种: 英语
中文简介:
聚类是在非结构化数据集中寻找重要数据段的经典问题。一般来说,这是一个不适定的问题。一种常见的方法是将数据集作为某个底层空间上未知概率分布函数的样本。然后,聚类就成为理解分布函数行为的问题。在本文中,我将介绍基于持久性的集群。在一些温和的假设下,该算法具有多种强有力的理论保证。特别是,它可以证明近似于底层分布函数的结构,即使底层空间只是近似已知的。这种方法在很大程度上基于持久同调(也称为拓扑持久性),这是计算拓扑领域的一个相对较新的发展。正是这个框架使得许多证明成为可能。讨论将包括对持久性的一般介绍,因此不需要预先了解。在实际应用中,该算法具有存储容量大、运行时间短等优点,能够快速处理大、高维的数据集。最后,它除了提供集群之外还提供可视反馈,这在数据集无法可视化时尤其有用。
课程简介: Clustering is a classical problem which looks for important segments in an unstructured data set. In general, this is an ill-posed problem. A common approach is to consider the data set as a sample of an unknown probability distribution function on some underlying space. Clustering then becomes a problem of understanding the behaviour of the distribution function. In this talk, I will introduce persistence-based clustering. Under some mild assumptions, the algorithm comes with a variety of strong theoretical guarantees. In particular, it provably approximates the structure of the underlying distribution function even when underlying space is only approximately known. The approach is based heavily on persistent homology (also refered to as topological persistence), a relatively recent development in the area of computational topology. It is precisely this framework which makes many of the proofs possible. The talk will include a general introduction to persistence so no prior knowledge is expected. On the practical side, the algorithm is efficient, both in memory size and running time, so it can handle large, high dimensional data sets quickly. Finally, it provides visual feedback in addition to the clusters, something which is particularly useful when the data sets cannot be visualized.
关 键 词: 计算机科学; 机器学习; 聚类
课程来源: 视频讲座网
最后编审: 2020-07-23:yumf
阅读次数: 81