
Accounting for Burstiness in Topic Models
主讲教师: Gabriel Doyle
开课单位: 加州大学
开课时间: 2009-08-26
课程语种: 英语
课程简介: Many different topic models have been used successfully for a variety of applications. However, even state-of-the-art topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once in a document, it is more likely to be used again. We introduce a topic model that uses Dirichlet compound multinomial (DCM) distributions to model this burstiness phenomenon. On both text and non-text datasets, the new model achieves better held-out likelihood than standard latent Dirichlet allocation (LDA). It is straightforward to incorporate the DCM extension into topic models that are more complex than LDA.
关 键 词: 主题模型; 狄利克雷分配; 计算机科学; 文本挖掘
