0


斜向随机森林

On oblique random forests
课程网址: https://videolectures.net/videos/ecmlpkdd2011_menze_forests  
主讲教师: Bjoern H. Menze
开课单位: 2011会议
开课时间: 2011-11-30
课程语种: 英语
中文简介:
在他关于随机森林的原始论文中,Breiman提出了两种不同的决策树集成:一种是由“正交”树生成的,每个分裂中都有单个特征的阈值,另一种是通过随机定向的超平面分隔特征空间的“斜交”树。然而,尽管人们对随机森林框架的兴趣日益浓厚,但到目前为止,由正交树(RF)构建的集成已经获得了最多(如果不是全部)的关注。在目前的工作中,我们建议使用由多元树构建的“斜”随机森林(oRF),这些树使用线性判别模型在内部节点上明确学习最优分割方向,而不是使用随机系数作为原始oRF。除了具有离散因子特征的数据集外,这种oRF在几乎所有数据集上都优于RF和其他分类器。学习节点模型的性能明显优于随机分割。oRF特征重要性得分比标准RF特征重要性得分(如基尼系数或置换重要性)更可取。oRF决策空间的拓扑结构似乎更平滑,更适合数据,从而提高了泛化性能。总体而言,在涉及数值和光谱数据的大多数学习任务中,这里的oRF提案可能比标准RF更受欢迎。
课程简介: In his original paper on random forests, Breiman proposed two different decision tree ensembles: one generated from "orthogonal" trees with thresholds on individual features in every split, and one from "oblique" trees separating the feature space by randomly oriented hyperplanes. In spite of a rising interest in the random forest framework, however, ensembles built from orthogonal trees (RF) have gained most, if not all, attention so far. In the present work we propose to employ "oblique" random forests (oRF) built from multivariate trees which explicitly learn optimal split directions at internal nodes using linear discriminative models, rather than using random coefficients as the original oRF. This oRF outperforms RF, as well as other classifiers, on nearly all data sets but those with discrete factorial features. Learned node models perform distinctively better than random splits. An oRF feature importance score shows to be preferable over standard RF feature importance scores such as Gini or permutation importance. The topology of the oRF decision space appears to be smoother and better adapted to the data, resulting in improved generalization performance. Overall, the oRF propose here may be preferred over standard RF on most learning tasks involving numerical and spectral data.
关 键 词: 斜向随机森林; 原始论文; 决策树集成
课程来源: 视频讲座网
数据采集: 2024-10-22:liyq
最后编审: 2024-10-22:liyq
阅读次数: 6