0


相似度函数的理论学习和聚类

On a Theory of Similarity Functions for Learning and Clustering
课程网址: http://videolectures.net/mlss09us_blum_tsflc/  
主讲教师: Avrim Blum
开课单位: 卡内基梅隆大学
开课时间: 信息不详。欢迎您在右侧留言补充。
课程语种: 英语
中文简介:
核心方法已经成为机器学习的有力工具。它们在许多应用程序中都表现良好,而且还有一个完善的理论,就是什么使给定的内核对给定的学习问题有用。然而,这一理论要求将内核视为隐式(通常难以描述)映射到高维空间中。在本文中,我将描述一个理论的开发工作,该理论将内核视为数据对象之间相似度的度量,并根据相似度函数如何与手头的任务相关的非常直观、直接的特性来描述给定内核(或更一般的相似度函数)的有用性,而无需参考任何隐式空格。我还将讨论这个框架的扩展,从纯粹的未标记数据(即集群)中学习。特别是,我们可以问相似函数的属性应该有多强(就其与未知的所需聚类的关系而言),以便能够很好地聚类:在完全没有标签信息的情况下学好。我们发现,如果我们愿意稍微放宽目标(例如,允许算法生成一个层次聚类,如果某些修剪接近所需的聚类,我们将称之为成功),那么这个问题将导致大量有趣的图论和博弈论属性,这些属性足以很好地进行聚类。这项工作可以看作定义了一种用于集群的PAC模型。(本次谈话基于与Maria Florina Balcan、Santosh Vempala和Nati Srebro的合作)。
课程简介: Kernel methods have become powerful tools in machine learning. They perform well in many applications, and there is also a well-developed theory of what makes a given kernel useful for a given learning problem. However, this theory requires viewing kernels as implicit (and often difficult to characterize) maps into high-dimensional spaces. In this talk I will describe work on developing a theory that just views a kernel as a measure of similarity between data objects, and describes the usefulness of a given kernel (or more general similarity function) in terms of fairly intuitive, direct properties of how the similarity function relates to the task at hand, without need to refer to any implicit spaces. I will also talk about an extension of this framework to learning from purely unlabeled data, i.e., clustering. In particular, one can ask how much stronger the properties of a similarity function should be (in terms of its relation to the unknown desired clustering) so that it can be used to cluster well: to learn well without any label information at all. We find that if we are willing to relax the objective a bit (for example, allow the algorithm to produce a hierarchical clustering that we will call successful if some pruning is close to the desired clustering), then this question leads to a number of interesting graph-theoretic and game-theoretic properties that are sufficient to cluster well. This work can be viewed defining a kind of PAC model for clustering. (This talk based on work joint with Maria-Florina Balcan, Santosh Vempala, and Nati Srebro).
关 键 词: 相似性函数; 聚类学习; 机器学习
课程来源: 视频讲座网
最后编审: 2019-11-22:cwx
阅读次数: 29