0


基于概念提高文本分类模型

A Concept-based Model for Enhancing Text Categorization
课程网址: http://videolectures.net/kdd07_shehata_acbm/  
主讲教师: Shady Shehata
开课单位: 滑铁卢大学
开课时间: 2007-09-14
课程语种: 英语
中文简介:
大多数文本分类技术基于文本的单词和/或短语分析。术语频率的统计分析仅捕获该术语在文档中的重要性。但是,两个术语在其文档中可以具有相同的频率,但是一个术语对其句子的含义的贡献大于另一个术语。因此,底层模型应指示捕获文本语义的术语。在这种情况下,模型可以捕获呈现句子概念的术语,从而发现文档的主题。介绍了一种新的基于概念的模型,该模型分析句子和文档级别的术语,而不是传统的文档分析。基于概念的模型可以有效地区分关于句子语义的非重要术语和保持表示句子含义的概念的术语。所提出的模型包括基于概念的统计分析器,概念本体图表示和概念提取器。通过基于概念的统计分析器和概念本体图表示,为句子语义贡献的术语被赋予两个不同的权重。这两个重量组合成一个新的重量。具有最大组合权重的概念由概念提取器选择。在文本分类中对不同数据集使用所提出的基于概念的模型进行一组实验。实验证明了传统加权与基于概念的统计分析器和概念本体图的组合方法获得的基于概念的加权之间的比较。结果评估依赖于两个质量测量,宏观平均F1和误差率。当新开发的基于概念的模型用于提高文本分类的质量时,这些质量测量得到改善
课程简介: Most of text categorization techniques are based on word and/or phrase analysis of the text. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based model that analyzes terms on the sentence and document levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of concept-based statistical analyzer, conceptual ontological graph representation, and concept extractor. The term which contributes to the sentence semantics is assigned two different weights by the concept-based statistical analyzer and the conceptual ontological graph representation. These two weights are combined into a new weight. The concepts that have maximum combined weights are selected by the concept extractor. A set of experiments using the proposed concept-based model on different datasets in text categorization is conducted. The experiments demonstrate the comparison between traditional weighting and the concept-based weighting obtained by the combined approach of the concept-based statistical analyzer and the conceptual ontological graph. The evaluation of results is relied on two quality measures, the Macro-averaged F1 and the Error rate. These quality measures are improved when the newly developed concept-based model is used to enhance the quality of the text categorization
关 键 词: 底层模型; 统计分析; 概念的本体论
课程来源: 视频讲座网
最后编审: 2020-06-27:zyk
阅读次数: 66