0


文本分类中SVM行为的解释

Explanation of SVM's behaviour in text classification
课程网址: http://videolectures.net/solomon_colas_esvm/  
主讲教师: Fabrice Colas
开课单位: 莱顿大学
开课时间: 2007-08-24
课程语种: 英语
中文简介:
我们关注在文本分类中学习分类规则的问题,许多作者提出了支持向量机(SVM)作为领先的分类方法。然而,大量研究反复指出,在某些情况下,SVM在简单方法(如朴素贝叶斯或最近邻居规则)方面表现不佳。在本文中,我们旨在更好地理解由稀疏的单词特征空间袋代表的典型文本分类问题中的SVM行为。我们详细研究了在改变训练集大小,特征数量时的性能和支持向量的数量,并且与现有研究不同,我们还研究了SVM自由参数C,它是SVM对偶中的Lagrange乘数上限。我们证明了C较小的SVM解决方案是高性能的。但是,大多数培训文档然后是共享相同权重C的有界支持向量。因此,SVM简化为最接近的均值分类器;这就提出了一个关于稀疏单词特征空间中SVM优点的有趣问题。另外,对于特定训练集大小/特征组合数量,SVM的性能会下降。
课程简介: We are concerned with the problem of learning classification rules in text categorization where many authors presented Support Vector Machines (SVM) as leading classification method. Number of studies, however, repeatedly pointed out that in some situations SVM is outperformed by simpler methods such as naive Bayes or nearest-neighbor rule. In this paper, we aim at developing better understanding of SVM behaviour in typical text categorization problems represented by sparse bag of words feature spaces. We study in details the performance and the number of support vectors when varying the training set size, the number of features and, unlike existing studies, also SVM free parameter C, which is the Lagrange multipliers upper bound in SVM dual. We show that SVM solutions with small C are high performers. However, most training documents are then bounded support vectors sharing a same weight C . Thus, SVM reduce to a nearest mean classifier; this raises an interesting question on SVM merits in sparse bag of words feature spaces. Additionally, SVM suffer from performance deterioration for particular training set size/number of features combinations.
关 键 词: 分类规则; 特征空间袋; 乘数上限
课程来源: 视频讲座网
最后编审: 2019-09-21:cwx
阅读次数: 34