0


主题模型中突发事件的解释

Accounting for Burstiness in Topic Models
课程网址: http://videolectures.net/icml09_doyle_abtm/  
主讲教师: Gabriel Doyle
开课单位: 加州大学
开课时间: 2009-08-26
课程语种: 英语
中文简介:
许多不同的主题模型已经成功地用于各种应用。然而,即使是最先进的主题模型也有一个重要的缺点,即它们无法捕捉单词突然出现的趋势;语言的一个基本特性是,如果一个单词在文档中使用一次,那么它很可能再次使用。我们引入了一个主题模型,该模型使用狄利克雷复合多项式(DCM)分布来模拟这种突发现象。在文本和非文本数据集上,新模型实现了比标准潜在狄利克雷分配(LDA)更好的保持似然性。将DCM扩展合并到比LDA更复杂的主题模型中是很简单的。
课程简介: Many different topic models have been used successfully for a variety of applications. However, even state-of-the-art topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once in a document, it is more likely to be used again. We introduce a topic model that uses Dirichlet compound multinomial (DCM) distributions to model this burstiness phenomenon. On both text and non-text datasets, the new model achieves better held-out likelihood than standard latent Dirichlet allocation (LDA). It is straightforward to incorporate the DCM extension into topic models that are more complex than LDA.
关 键 词: 主题模型; 狄利克雷分配; 计算机科学; 文本挖掘
课程来源: 视频讲座网
数据采集: 2023-03-07:chenjy
最后编审: 2023-03-07:chenjy
阅读次数: 21