通过Dirichlet Forest Priors将领域知识纳入主题建模

Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors
课程网址: http://videolectures.net/icml09_andrzejewski_idk/  
主讲教师: David Andrzejewski
开课单位: 威斯康星大学
开课时间: 2009-08-26
课程语种: 英语
主题建模方法的用户通常具有关于在各种主题中应该具有高或低概率的单词的组成的知识。我们在Latent Dirichlet分配框架中使用新的Dirichlet森林来结合这样的领域知识。先前是具有特殊结构的Dirichlet树分布的混合。我们通过倒塌的吉布斯采样来展示它的结构和推断。对合成和真实数据集的实验证明了我们的模型能够遵循和概括超出用户指定的领域知识。
课程简介: Users of topic modeling methods often have knowledge about the composition of words that should have high or low probability in various topics. We incorporate such domain knowledge using a novel Dirichlet Forest prior in a Latent Dirichlet Allocation framework. The prior is a mixture of Dirichlet tree distributions with special structures. We present its construction, and inference via collapsed Gibbs sampling. Experiments on synthetic and real datasets demonstrate our model’s ability to follow and generalize beyond userspecified domain knowledge.
关 键 词: 吉布斯采样; 主题建模; 分配框架
课程来源: 视频讲座网
最后编审: 2020-07-14:yumf
阅读次数: 45