0


多项主题模型的自动标注

Automatic Labeling of Multinomial Topic Models
课程网址: http://videolectures.net/kdd07_mei_alm/  
主讲教师: Qiaozhu Mei
开课单位: 伊利诺伊大学
开课时间: 2007-08-14
课程语种: 英语
中文简介:
单词上的多项分布经常用于对文本集合中的主题进行建模。将所有此类主题模型应用于任何文本挖掘问题的常见主要挑战是准确标记多项主题模型,以便用户可以解释发现的主题。到目前为止,这些标签已经以主观方式手动生成。在本文中,我们提出了以客观方式自动标记多项主题模型的概率方法。我们将此标记问题转换为优化问题,该问题涉及最小化单词分布之间的Kullback Leibler差异以及最大化标签和主题模型之间的互信息。用户研究的实验已经在两个具有不同类型的文本数据集上完成。结果表明,所提出的标记方法对于生成对于解释发现的主题模型有意义且有用的标记非常有效。我们的方法是通用的,可以应用于标记通过各种主题模型学习的主题,如PLSA,LDA及其变体。
课程简介: Multinomial distributions over words are frequently used to model topics in text collections. A common, major challenge in applying all such topic models to any text mining problem is to label a multinomial topic model accurately so that a user can interpret the discovered topic. So far, such labels have been generated manually in a subjective way. In this paper, we propose probabilistic approaches to automatically labeling multinomial topic models in an objective way. We cast this labeling problem as an optimization problem involving minimizing Kullback-Leibler divergence between word distributions and maximizing mutual information between a label and a topic model. Experiments with user study have been done on two text data sets with different genres. The results show that the proposed labeling methods are quite effective to generate labels that are meaningful and useful for interpreting the discovered topic models. Our methods are general and can be applied to labeling topics learned through all kinds of topic models such as PLSA, LDA, and their variations.
关 键 词: 模型自动标注; PLSA; LDA及其变体
课程来源: 视频讲座网
最后编审: 2019-06-21:chenxin
阅读次数: 85