0


大规模稀疏非线性分类的快速流量判别法

Fast Flux Discriminant for Large-Scale Sparse Nonlinear Classification
课程网址: http://videolectures.net/kdd2014_chen_nonlinear_classification/  
主讲教师: Wenlin Chen
开课单位: 华盛顿大学
开课时间: 2014-10-08
课程语种: 英语
中文简介:

在本文中,我们提出了一种用于大规模非线性分类的新型监督学习方法,Fast Flux Discriminant (FFD)。与其他现有方法相比,FFD 具有无可比拟的优势,因为它获得了线性模型的效率和可解释性以及非线性模型的准确性。它也是稀疏的,可以自然地处理混合数据类型。它的工作原理是将整个特征空间中的核密度估计分解为选定的低维子空间。由于存在许多可能的子空间,我们提出了一个用于子空间选择的子模块优化框架。然后将选定的子空间预测转换为可以学习线性模型的新特征。此外,由于转换后的特征自然期望非负权重,因此即使使用 L1 正则化,我们也只需要平滑优化。与内核方法等其他非线性模型不同,FFD 模型是可解释的,因为它赋予原始特征的重要性权重。它的训练和测试也比传统的核模型快得多。我们对现实世界的数据集进行了广泛的实证研究,并表明所提出的模型实现了最先进的分类结果,具有稀疏性、可解释性和出色的可扩展性。我们的模型可以在具有数百万个样本的数据集上在几分钟内学会,而大多数现有的非线性方法在空间和时间上都非常昂贵。

课程简介: In this paper, we propose a novel supervised learning method, Fast Flux Discriminant (FFD), for large-scale nonlinear classification. Compared with other existing methods, FFD has unmatched advantages, as it attains the efficiency and interpretability of linear models as well as the accuracy of nonlinear models. It is also sparse and naturally handles mixed data types. It works by decomposing the kernel density estimation in the entire feature space into selected low-dimensional subspaces. Since there are many possible subspaces, we propose a submodular optimization framework for subspace selection. The selected subspace predictions are then transformed to new features on which a linear model can be learned. Besides, since the transformed features naturally expect non-negative weights, we only require smooth optimization even with the L1 regularization. Unlike other nonlinear models such as kernel methods, the FFD model is interpretable as it gives importance weights on the original features. Its training and testing are also much faster than traditional kernel models. We carry out extensive empirical studies on real-world datasets and show that the proposed model achieves state-of-the-art classification results with sparsity, interpretability, and exceptional scalability. Our model can be learned in minutes on datasets with millions of samples, for which most existing nonlinear methods will be prohibitively expensive in space and time.
关 键 词: 低维子空间; 监督学习; 样本数据集
课程来源: 视频讲座网
数据采集: 2021-06-09:zyk
最后编审: 2021-06-09:zyk
阅读次数: 44