0


多集群数据的无监督特征选择

Unsupervised Feature Selection for Multi-Cluster Data
课程网址: http://videolectures.net/kdd2010_cai_ufsm/  
主讲教师: Deng Cai
开课单位: 浙江大学
开课时间: 2010-10-01
课程语种: 汉简
中文简介:
在许多数据分析任务中,经常会遇到高维数据。特征选择技术是为了找到原始特征的相关特征子集,以便于聚类、分类和检索。在本文中,我们考虑了无监督学习场景中的特征选择问题,由于没有指导相关信息搜索的类标签,这一问题尤其困难。特征选择问题本质上是一个计算代价很高的组合优化问题。传统的无监督特征选择方法通过根据每个特征独立计算的特定分数选择排名靠前的特征来解决这个问题。这些方法忽略了不同特征之间可能存在的相关性,因此无法生成最优特征子集。本文从流形学习和子集选择的L1正则化模型的最新发展出发,提出了一种无监督特征选择的新方法\em多簇特征选择(mcfs)。具体地说,我们选择这些特性,以便能够最好地保留数据的多集群结构。相应的优化问题只涉及一个稀疏本征问题和一个L1正则化最小二乘问题,可以有效地解决。对各种实际数据集的大量实验结果证明了该算法的优越性。
课程简介: In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenario, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional unsupervised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, we propose in this paper a new approach, called {\em Multi-Cluster Feature Selection} (MCFS), for unsupervised feature selection. Specifically, we select those features such that the multi-cluster structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigen-problem and a L1-regularized least squares problem. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithm.
关 键 词: 数据分析; 特征选择; 无监督学习; 组合优化
课程来源: 视频讲座网
最后编审: 2019-12-21:lxf
阅读次数: 172