0


SVM真的可以扩展到大的单词包特征空间吗?

Does SVM Really Scale Up to Large Bag of Words Feature Spaces?
课程网址: http://videolectures.net/ida07_colas_dsvmrs/  
主讲教师: Fabrice Colas
开课单位: 莱顿大学
开课时间: 2007-10-08
课程语种: 英语
中文简介:
我们关注在文本分类中学习分类规则的问题,其中许多作者将支持向量机(SVM)作为主要分类方法。然而,研究的数量反复指出,在某些情况下,SVM的表现优于简单的方法,如朴素贝叶斯或最近邻规则。在本文中,我们的目标是在稀疏的词袋特征空间所代表的典型文本分类问题中更好地理解SVM行为。当改变训练集大小,特征数量以及与现有研究不同时,我们还详细研究了支持向量的性能和数量,以及SVM自由参数C,它是SVM对偶中的拉格朗日乘数上界。我们证明了具有小C的SVM解决方案是高性能的。然而,大多数训练文档然后是具有相同权重C的有界支持向量。因此,SVM减少到最接近的平均分类器;这提出了一个有趣的问题,关于SVM优点在稀疏的单词特征空间。另外,SVM遭受特定训练集大小/特征组合数量的性能恶化。
课程简介: We are concerned with the problem of learning classification rules in text categorization where many authors presented Support Vector Machines (SVM) as leading classification method. Number of studies, however, repeatedly pointed out that in some situations SVM is outperformed by simpler methods such as naive Bayes or nearest-neighbor rule. In this paper, we aim at developing better understanding of SVM behaviour in typical text categorization problems represented by sparse bag of words feature spaces. We study in details the performance and the number of support vectors when varying the training set size, the number of features and, unlike existing studies, also SVM free parameter C, which is the Lagrange multipliers upper bound in SVM dual. We show that SVM solutions with small C are high performers. However, most training documents are then bounded support vectors sharing a same weight C. Thus, SVM reduce to a nearest mean classifier; this raises an interesting question on SVM merits in sparse bag of words feature spaces. Additionally, SVM suffer from performance deterioration for particular training set size/number of features combinations.
关 键 词: 文本分类; 支持向量机; 单词特征
课程来源: 视频讲座网
最后编审: 2019-04-27:cwx
阅读次数: 54