0


聚类目标的pac-bayesian方法

PAC-Bayesian Approach to Formulation of Clustering Objectives
课程网址: http://videolectures.net/nipsworkshops09_seldin_pba/  
主讲教师: Yevgeny Seldin
开课单位: 哥本哈根大学
开课时间: 2010-01-19
课程语种: 英语
中文简介:
聚类是一种广泛用于探索性数据分析的工具。然而,对聚类的理论理解非常有限。我们仍然没有找到一个看似简单的问题“数据中存在多少簇?”的答案,而且基于不同优化目标的聚类的正式比较远远超出我们的能力。由于缺乏良好的理论支持,会产生多种启发式方法,使实践者和实地的停滞发展相混淆。我们认为聚类问题的不良特性是由于聚类经常从其后续应用环境中取出而引起的。我们认为,人们不仅仅为了聚类数据而对数据进行聚类,而是为了解决某些更高级别的任务。通过评估聚类对更高级别任务的解决方案的贡献,可以比较不同的聚类,甚至是通过不同的优化目标获得的聚类。在前面的工作中,表明这种方法可以应用于协同集群解决方案的评估和设计。在这里,我们建议将此方法扩展到应用群集的其他设置。
课程简介: Clustering is a widely used tool for exploratory data analysis. However, the theoretical understanding of clustering is very limited. We still do not have a well-founded answer to the seemingly simple question of “how many clusters are present in the data?”, and furthermore a formal comparison of clusterings based on different optimization objectives is far beyond our abilities. The lack of good theoretical support gives rise to multiple heuristics that confuse the practitioners and stall development of the field. We suggest that the ill-posed nature of clustering problems is caused by the fact that clustering is often taken out of its subsequent application context. We argue that one does not cluster the data just for the sake of clustering it, but rather to facilitate the solution of some higher level task. By evaluation of the clustering’s contribution to the solution of the higher level task it is possible to compare different clusterings, even those obtained by different optimization objectives. In the preceding work it was shown that such an approach can be applied to evaluation and design of co-clustering solutions. Here we suggest that this approach can be extended to other settings, where clustering is applied.
关 键 词: 聚类; 数据分析; 数据
课程来源: 视频讲座网
最后编审: 2019-09-07:lxf
阅读次数: 53