挖掘正模式和负模式以发现相关性特征Mining Positive and Negative Patterns for Relevance Feature Discovery |
|
课程网址: | http://videolectures.net/kdd2010_li_mpn/ |
主讲教师: | Yuefeng Li |
开课单位: | 昆士兰理工大学 |
开课时间: | 2010-10-01 |
课程语种: | 英语 |
中文简介: | 由于大量的术语,模式和噪声,保证文本文档中发现的相关特征的质量以描述用户偏好是一个巨大的挑战。大多数现有的流行文本挖掘和分类方法采用了基于术语的方法。然而,他们都遭受了多义和同义的问题。多年来,人们经常认为基于模式的方法在描述用户偏好时应该比基于术语的方法表现更好,但许多实验不支持这一假设。纸上呈现的创新技术为这一难题带来了突破。该技术将文本文档中的正面和负面模式发现为更高级别的特征,以便基于它们的特异性及其在更高级别特征中的分布来准确地加权低级特征(术语)。在路透社语料库第1卷和TREC主题上使用该技术的大量实验表明,所提出的方法明显优于由Okapi BM25,Rocchio或支持向量机支持的基于术语的最先进方法以及基于模式的精确度,召回率和F度量方法。 。 |
课程简介: | It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures. |
关 键 词: | 流行文本挖掘; 权低级特征; 向量机支持 |
课程来源: | 视频讲座网 |
最后编审: | 2019-05-11:lxf |
阅读次数: | 16 |