0


学习和聚类的相似函数理论

A theory of similarity functions for learning and clustering
课程网址: http://videolectures.net/sicgt07_blum_atosf/  
主讲教师: Avrim Blum
开课单位: 卡内基梅隆大学
开课时间: 2007-09-07
课程语种: 英语
中文简介:
事实证明,内核方法是机器学习中非常强大的工具。此外,还有一个完善的理论,即内核对于给定的学习问题有用。然而,虽然核函数可以被认为只是满足额外数学特性的成对相似函数,但是该理论要求将核视为隐式(并且通常难以表征)映射到高维空间。在本次演讲中,我将描述一个更一般的理论,它适用于更一般的相似性函数(不仅仅是法律内核),并且还描述了给定相似度函数在诱导加权图的更直观,直接的属性方面的有用性。所提出的框架的一个有趣特征是它也可以应用于从纯粹未标记的数据中学习,即聚类。特别是,人们可以问相似度函数的属性应该有多强(就其与未知期望聚类的关系而言),以便它可以用于*聚类*。研究这个问题会产生许多有趣的图论理论性质,它们在归纳设置中的分析使用[FK99,AFKK03]的规则引理类型结果。这项工作与Maria Florina Balcan和Santosh Vempala联合。
课程简介: Kernel methods have proven to be very powerful tools in machine learning. In addition, there is a well-developed theory of sufficient conditions for a kernel to be useful for a given learning problem. However, while a kernel function can be thought of as just a pairwise similarity function that satisfies additional mathematical properties, this theory requires viewing kernels as implicit (and often difficult to characterize) maps into high-dimensional spaces. In this talk I will describe a more general theory that applies to more general similarity functions (not just legal kernels) and furthermore describes the usefulness of a given similarity function in terms of more intuitive, direct properties of the induced weighted graph. An interesting feature of the proposed framework is that it can also be applied to learning from purely unlabeled data, i.e., clustering. In particular, one can ask how much stronger the properties of a similarity function should be (in terms of its relation to the unknown desired clustering) so that it can be used to *cluster* well. Investigating this question leads to a number of interesting graph-theoretic properties, and their analysis in the inductive setting uses regularity-lemma type results of [FK99,AFKK03]. This work is joint with Maria-Florina Balcan and Santosh Vempala.
关 键 词: 内核方法; 机器学习; 相似函数
课程来源: 视频讲座网
最后编审: 2020-06-22:chenxin
阅读次数: 58