动态环境下的双马尔可夫链主题模型A Dual Markov Chain Topic Model for Dynamic Environments |
|
课程网址: | http://videolectures.net/kdd2018_acharya_markov_environments/ |
主讲教师: | Ayan Acharya |
开课单位: | 德克萨斯大学奥斯汀分校 |
开课时间: | 2018-11-23 |
课程语种: | 英语 |
中文简介: | 大量的数字文本导致了对主题模型的广泛研究,这些主题模型使用潜在表示来推理文档。由于对于许多在线或流媒体文本源(如新闻媒体),话题的数量和性质会随着时间的推移而变化,因此已经有几项努力试图使用动态版本的话题模型来解决这种情况。不幸的是,当现有方法的模型参数随时间变化时,它们会遇到更复杂的推理,从而导致高计算复杂性和性能下降。本文介绍了DM-DTM,一种双马尔可夫链动态主题模型,用于描述随时间演变的语料库。该模型使用gamma马尔可夫链和Dirichlet马尔可夫链,分别允许主题流行度和单词主题分配随时间平稳变化。与现有方法相比,负细胞扩增技巧的新应用导致所有所需条件后验的简单、高效、封闭形式更新,从而导致更低的计算要求以及对初始条件的敏感性。此外,通过先前的伽马过程,直接从数据中推断出所需主题的数量,而不是预先指定,并且可以随着数据的变化而变化。使用多个现实世界语料库进行的经验比较表明,DM-DTM在静态和动态主题模型方面都明显优于强基线。 |
课程简介: | The abundance of digital text has led to extensive research on topic models that reason about documents using latent representations. Since for many online or streaming textual sources such as news outlets, the number, and nature of topics change over time, there have been several efforts that attempt to address such situations using dynamic versions of topic models. Unfortunately, existing approaches encounter more complex inferencing when their model parameters are varied over time, resulting in high computation complexity and performance degradation. This paper introduces the DM-DTM, a dual Markov chain dynamic topic model, for characterizing a corpus that evolves over time. This model uses a gamma Markov chain and a Dirichlet Markov chain to allow the topic popularities and word-topic assignments, respectively, to vary smoothly over time. Novel applications of the NegativeBinomial augmentation trick result in simple, efficient, closed-form updates of all the required conditional posteriors, resulting in far lower computational requirements as well as less sensitivity to initial conditions, as compared to existing approaches. Moreover, via a gamma process prior, the number of desired topics is inferred directly from the data rather than being pre-specified and can vary as the data changes. Empirical comparisons using multiple realworld corpora demonstrate a clear superiority of DM-DTM over strong baselines for both static and dynamic topic models. |
关 键 词: | 双马尔可夫链主题模型; 负细胞扩增技巧; Dirichlet马尔可夫链; 多个现实世界语料库 |
课程来源: | 视频讲座网 |
数据采集: | 2023-03-09:cyh |
最后编审: | 2023-05-15:cyh |
阅读次数: | 32 |