0


狄利克雷成分分析:成分数据的特征提取

Dirichlet Component Analysis: Feature Extraction for Compositional Data
课程网址: http://videolectures.net/icml08_wang_dca/  
主讲教师: Hua-Yan Wang
开课单位: 北京大学
开课时间: 2008-07-29
课程语种: 英语
中文简介:
我们考虑了成分数据的特征提取 (降维), 其中数据向量被限制为正和。在现实世界的问题中, 数据组件 (变量) 通常具有复杂的 "相关性" 而它们的总数是巨大的。这种方案需要功能提取。也就是说, 我们将消除各分量的关联, 并减少它们的维度。传统的主成分分析 (pca) 等技术由于其独特的统计特性和满足组合数据约束的需要, 不适合这些问题。本文提出了一种新的成分数据特征提取方法。我们的方法首先确定一系列维数约简投影, 以保留所有相关约束, 然后找到最佳投影, 最大限度地提高投影数据的估计 dirichlet 精度。它将合成数据减少到给定的较低维数, 而较低维空间中的分量尽可能不相关。我们为我们的方法奠定了理论基础, 并在一些合成和现实世界的数据集上验证了其有效性。
课程简介: We consider feature extraction (dimensionality reduction) for compositional data, where the data vectors are constrained to be positive and constant-sum. In real-world problems, the data components (variables) usually have complicated "correlations" while their total number is huge. Such scenario demands feature extraction. That is, we shall de-correlate the components and reduce their dimensionality. Traditional techniques such as the Principle Component Analysis (PCA) are not suitable for these problems due to unique statistical properties and the need to satisfy the constraints in compositional data. This paper presents a novel approach to feature extraction for compositional data. Our method first identifies a family of dimensionality reduction projections that preserve all relevant constraints, and then finds the optimal projection that maximizes the estimated Dirichlet precision on projected data. It reduces the compositional data to a given lower dimensionality while the components in the lower-dimensional space are de-correlated as much as possible. We develop theoretical foundation of our approach, and validate its effectiveness on some synthetic and real-world datasets.
关 键 词: 主成分分析; Dirichlet过程; 降维投影
课程来源: 视频讲座网
最后编审: 2020-07-06:heyf
阅读次数: 64