傅立叶域中的聚类排名分析Clustering Rankings in the Fourier Domain |
|
课程网址: | http://videolectures.net/ecmlpkdd2011_gaudel_clustering/ |
主讲教师: | Romaric Gaudel |
开课单位: | 法国国家信息与自动化研究所 |
开课时间: | 2011-10-03 |
课程语种: | 英语 |
中文简介: | 本文的目的是引入一种新方法,将秩数据聚类在一组可能较大的基数n∈ℕ*上,依赖于在对称群上定义的函数的傅里叶表示。在目前的设置中,涵盖了各种各样的实际情况,排名数据被视为分布。聚类分析旨在将数据分割为同类子群,希望在某种意义上非常不同。而考虑不相交度测量/非交换组上的分布之间的距离,通过将其视为嵌入在集合[0,1] n中的坐标方式!例如,几乎不会产生可解释的结果并导致面临明显的计算问题,相比之下,评估傅里叶域中的排列组的接近度可能要容易得多。实际上,在各种各样的情况下,一些精心选择的傅立叶(矩阵)系数可以允许有效地近似两个分布以及它们的不相似程度,同时以可解释的方式描述全局属性。遵循在无监督学习的背景下自动特征选择的最新进展的脚步,我们建议根据可以以简单方式在傅里叶域中表达的标准的优化来投射聚类排序的任务。所提出的方法的有效性通过基于人工和真实数据的数值实验来说明。 |
课程简介: | It is the purpose of this paper to introduce a novel approach to clustering rank data on a set of possibly large cardinality n ∈ ℕ*, relying upon Fourier representation of functions defined on the symmetric group . In the present setup, covering a wide variety of practical situations, rank data are viewed as distributions on . Cluster analysis aims at segmenting data into homogeneous subgroups, hopefully very dissimilar in a certain sense. Whereas considering dissimilarity measures/distances between distributions on the non commutative group , in a coordinate manner by viewing it as embedded in the set [0,1] n! for instance, hardly yields interpretable results and leads to face obvious computational issues, evaluating the closeness of groups of permutations in the Fourier domain may be much easier in contrast. Indeed, in a wide variety of situations, a few well-chosen Fourier (matrix) coefficients may permit to approximate efficiently two distributions on as well as their degree of dissimilarity, while describing global properties in an interpretable fashion. Following in the footsteps of recent advances in automatic feature selection in the context of unsupervised learning, we propose to cast the task of clustering rankings in terms of optimization of a criterion that can be expressed in the Fourier domain in a simple manner. The effectiveness of the method proposed is illustrated by numerical experiments based on artificial and real data. |
关 键 词: | 傅里叶; 数据聚类; 聚类排序 |
课程来源: | 视频讲座网 |
最后编审: | 2020-06-08:cxin |
阅读次数: | 97 |