0


基于延森-香农分歧特征选择稳定性评价

Feature Selection Stability Assessment based on the Jensen-Shannon Divergence
课程网址: http://videolectures.net/ecmlpkdd2011_alaiz_rodriguez_feauture/  
主讲教师: Rocío Alaiz-Rodriguez
开课单位: 莱昂大学
开课时间: 2011-10-03
课程语种: 英语
中文简介:
特征选择和排序技术在高维数据分析中起着重要作用。特别是,当稍后研究特征重要性以便更好地理解基础过程时,它们的稳定性变得至关重要。数据集中的微小变化可能影响特征选择/排序算法的结果的事实在文献中长期被忽略。我们提出了一种信息理论方法,使用Jensen-Shannon散度来评估这种稳定性(或稳健性)。与其他度量不同,此新度量适用于不同的算法结果:完整排名列表,部分子列表(前k个列表)以及最少研究的部分排名列表。这种通用度量试图按照概率方法测量具有相同大小的整个列表集之间的不一致,并且能够更加重视列表顶部出现的差异。我们通过人工生成的特征选择/排名结果以及具有不同基于过滤器的特征选择器的光谱脂肪数据集,对Spearman等级相关性和Kuncheva索引进行说明和比较。
课程简介: Feature selection and ranking techniques play an important role in the analysis of high-dimensional data. In particular, their stability becomes crucial when the feature importance is later studied in order to better understand the underlying process. The fact that a small change in the dataset may affect the outcome of the feature selection/ranking algorithm has been long overlooked in the literature. We propose an information-theoretic approach, using the Jensen-Shannon divergence to assess this stability (or robustness). Unlike other measures, this new metric is suitable for different algorithm outcomes: full ranked lists, partial sublists (top-k lists) as well as the least studied partial ranked lists. This generalized metric attempts to measure the disagreement among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the differences that appear at the top of the list. We illustrate and compare it with popular metrics like the Spearman rank correlation and the Kuncheva’s index on feature selection/ranking outcomes artificially generated and on an spectral fat dataset with different filter-based feature selectors.
关 键 词: 特征选择; 排序技术; 信息理论的方法; 广义度量
课程来源: 视频讲座网
最后编审: 2020-06-08:wuyq
阅读次数: 81