0


基于链接的聚类特征

Characterization of Linkage Based Clustering
课程网址: http://videolectures.net/nipsworkshops09_loker_clb/  
主讲教师: David Loker
开课单位: 滑铁卢大学
开课时间: 2010-01-19
课程语种: 英语
中文简介:
存在各种各样的聚类算法,当在相同数据上运行时,通常会产生不同的聚类。然而,没有原则方法来指导聚类算法的选择。当然,选择适当的聚类取决于任务。因此,我们必须坚持领域知识。挑战在于在domainexpert和算法设计者之间传达这种知识。在聚类算法的选择中为聚集用户提供指导的一种方法是识别用户可能希望算法满足的重要属性,并确定哪些算法满足这些属性中的每一个。然后,群集用户可以利用先验知识来确定对其应用有意义的属性。最后,将有足够丰富的属性集合,为各种群集用户提供足够详细的指导。对于有用的属性,用户需要能够容易地确定属性的合意性。这种聚类算法的描述将通过回答一系列简单的问题来产生聚类算法选择的原则指南。 Bosagh Zadeh和Ben David [1]通过提供一组表征单个连接的抽象属性,在这个方向上取得了进展。在这项工作中,我们通过表征一系列聚类算法,在同一方向上给出了另一个结果。这些是制定聚类算法选择的广泛指导方案的初步步骤。基于链接的聚类是最常用和广泛研究的聚类范例之一。我们提供了一组非常简单的属性,可以唯一地识别基于链接的聚类算法。我们的特征突出了基于链接的算法如何与其他聚类算法进行比较。将先前提出的属性与我们新提出的属性相结合,我们展示了这些属性如何划分常用聚类算法的空间。具体来说,我们通过基于共同链接,基于质心和谱聚类算法来展示哪些属性得到满足。我们希望这种分析以及基于链接的聚类的表征将为用户选择聚类算法提供有用的指导。
课程简介: There are a wide variety of clustering algorithms that, when run on the same data, often produce very different clusterings. Yet there is no principled method to guide the selection of a clustering algorithm. The choice of an appropriate clustering is, of course, task dependent. As such, we must rely on domain knowledge. The challenge is to communicate such knowledge between the domain expert and the algorithm designer. One approach to providing guidance to clustering users in the selection of a clustering algorithm is to identify important properties that a user may want an algorithm to satisfy, and determine which algorithms satisfy each of these properties. Clustering users can then utilize prior knowledge to determine the properties that make sense for their application. Ultimately, there would be a sufficiently rich set of properties that would provide detailed enough guidelines for a wide variety of clustering users. For a property to be useful, a user needs to be able to easily determine the desirability of the property. Such a description of clustering algorithms would yield principled guidelines for clustering algorithm selection by answering a series of simple questions. Bosagh Zadeh and Ben-David [1] make progress in this direction by providing a set of abstract properties that characterize single linkage. In this work, we give another result in the same direction by characterizing a family of clustering algorithms. These are initial steps toward the ambitious program of developing broad guidelines for clustering algorithm selection. Linkage-based clustering is one of the most commonly-used and widely-studied clustering paradigms. We provide a surprisingly simple set of properties that uniquely identify linkage-based clustering algorithms. Our characterization highlights how linkage-based algorithms compare to other clustering algorithms. Combining previously proposed properties with our newly proposed ones, we show how these properties partition the space of commonly-used clustering algorithms. Specifically, we show which of these properties are satisfied by common linkage-based, centroid-based, and spectral clustering algorithms. We hope that this analysis, as well as our characterization of linkage-based clustering, will provide useful guidelines for users in selecting clustering algorithms.
关 键 词: 聚类算法; 群集用户; 谱聚类算法
课程来源: 视频讲座网
最后编审: 2019-09-07:lxf
阅读次数: 46