0


利用特征之间的高阶相关性进行文本分类

Leveraging Higher Order Dependencies Between Features for Text Classification
课程网址: http://videolectures.net/ecmlpkdd09_pottenger_lhodbftc/  
主讲教师: William M. Pottenger
开课单位: 新泽西州立大学
开课时间: 2009-10-20
课程语种: 英语
中文简介:
传统的机器学习方法仅考虑单个数据实例中的特征值之间的关系,而忽略了跨实例链接特征的依赖关系。在这项工作中,我们通过利用特征之间的更高阶依赖性来开发一种监督学习的一般方法。我们引入了一种新的贝叶斯分类框架,称为高阶朴素贝叶斯(HONB)。与假设数据实例是独立的方法不同,HONB利用跨不同实例的特征值之间的共现关系。此外,我们通过开发一种新颖的数据驱动空间转换来概括我们的框架,该转换允许在向量空间中操作的任何分类器利用这些更高阶的共生关系。在几个基准文本语料库上获得的结果表明,高阶方法在基线(一阶)方法上实现了分类准确性的显着提高。
课程简介: Traditional machine learning methods only consider relationships between feature values within individual data instances while disregarding the dependencies that link features across instances. In this work, we develop a general approach to supervised learning by leveraging higher-order dependencies between features. We introduce a novel Bayesian framework for classification named Higher Order Naive Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages co-occurrence relations between feature values across different instances. Additionally, we generalize our framework by developing a novel data-driven space transformation that allows any classifier operating in vector spaces to take advantage of these higher-order co-occurrence relations. Results obtained on several benchmark text corpora demonstrate that higher-order approaches achieve significant improvements in classification accuracy over the baseline (first-order) methods.
关 键 词: 机器学习; 了跨实例链接; 高阶朴素贝叶斯
课程来源: 视频讲座网
最后编审: 2019-03-27:lxf
阅读次数: 149