0


寻找低误差的分类

On Finding Low Error Clusterings
课程网址: http://videolectures.net/mlss09us_balcan_flec/  
主讲教师: Maria-Florina Balcan
开课单位: 乔治亚理工学院
开课时间: 2009-07-30
课程语种: 英语
中文简介:
在基于距离的目标函数(如K中值、K平均值和最小和目标)下,对聚类数据的近似算法进行了大量的研究。这项工作的部分原因是希望能够很好地接近这些目标,从而产生更准确的解决方案。也就是说,对于诸如按功能对蛋白质进行聚类或按主题对图像进行聚类等问题,存在一些未知的正确的目标聚类,而隐含的假设是,根据这些基于距离的度量,这些近似最优的聚类对于目标的误差也近似正确。得到。在这项工作中,我们表明,如果我们把这个隐式假设明确化——也就是说,如果我们假设给定的聚类目标phi的任何c-近似值都是epsilon接近目标值——那么我们就可以产生O(epsilon)——接近目标值的聚类,即使对于获得c-近似值是np困难的值c也是如此。特别是,对于k-中位数、k-均值和最小和目标,我们表明我们可以实现任何常数c>;1的保证。我们的结果表明,对于这些聚类目标,如何通过明智地使用手头问题的所有可用信息,可以比迄今为止在近似文献中获得的结果更好地保证准确性。
课程简介: There has been substantial work on approximation algorithms for clustering data under distance-based objective functions such as k-median, k-means, and min-sum objectives. This work is fueled in part by the hope that approximating these objectives well will indeed yield more accurate solutions. That is, for problems such as clustering proteins by function, or clustering images by subject, there is some unknown correct "target" clustering and the implicit assumption is that clusterings that are approximately optimal in terms of these distance-based measures are also approximately correct in terms of error with respect to the target. In this work we show that if we make this implicit assumption explicit - that is, if we assume that any c-approximation to the given clustering objective Phi is epsilon-close to the target - then we can produce clusterings that are O(epsilon)-close to the target, even for values c for which obtaining a c-approximation is NP-hard. In particular, for the k-median, k-means, and min-sum objectives, we show that we can achieve this guarantee for any constant c > 1. Our results shows how for these clustering objectives one can get much better guarantees on accuracy than those implied by results obtained so far in the approximation literature, by wisely using all the available information for the problem at hand.
关 键 词: 计算机科学; 机器学习; 聚类
课程来源: 视频讲座网
最后编审: 2021-01-30:nkq
阅读次数: 41