0


高维数据中的最近邻:中心的出现和影响

Nearest Neighbors in High-Dimensional Data: The Emergence and Influence of Hubs
课程网址: https://videolectures.net/videos/icml09_radovanovic_nnh  
主讲教师: Miloš Radovanović
开课单位: 会议
开课时间: 2009-08-26
课程语种: 英语
中文简介:
高维度会带来严重的困难,被广泛认为是维度灾难的不同方面。在本文中,我们研究了诅咒的一个新方面,即k次出现的分布,即一个点在数据集中其他点的k个最近邻中出现的次数。我们发现,随着维数的增加,这种分布变得相当偏斜,出现了中心点(k值非常高的点)。我们研究了这种现象的起源,表明它是高维向量空间的固有属性,并探讨了它对基于向量空间中距离测量的应用的影响,特别是分类、聚类和信息检索。
课程简介: High dimensionality can pose severe difficulties, widely recognized as different aspects of the curse of dimensionality. In this paper we study a new aspect of the curse pertaining to the distribution of k-occurrences, i.e., the number of times a point appears among the k nearest neighbors of other points in a data set. We show that, as dimensionality increases, this distribution becomes considerably skewed and hub points emerge (points with very high k-occurrences). We examine the origin of this phenomenon, showing that it is an inherent property of highdimensional vector space, and explore its influence on applications based on measuring distances in vector spaces, notably classification, clustering, and information retrieval.
关 键 词: 高维数据; 维度灾难; 数据集
课程来源: 视频讲座网
数据采集: 2025-04-25:liyq
最后编审: 2025-04-25:liyq
阅读次数: 6