层次Dirichlet过程的在线变分推理Online Variational Inference for the Hierarchical Dirichlet Process |
|
课程网址: | http://videolectures.net/aistats2011_wang_online/ |
主讲教师: | Chong Wang |
开课单位: | 普林斯顿大学 |
开课时间: | 2011-03-06 |
课程语种: | 英语 |
中文简介: | 分层Dirichlet流程(HDP)是一种贝叶斯非参数模型,可用于对具有潜在无限数量组件的混合成员资格数据进行建模。它已广泛应用于概率主题建模中,其中数据是文档,而组件是反映集合中重复出现的模式(或“主题”)的术语的分布。给定一个文档集合,后验推理用于确定所需主题的数量并表征其分布。 HDP分析的局限性在于,现有的后验推理算法需要对所有数据进行多次遍历-这些算法对于大型应用程序来说是棘手的。我们为HDP提出了一种在线变分推理算法,该算法很容易适用于海量数据和流数据。我们的算法比传统的HDP推理算法快得多,并且可以让我们分析更大的数据集。我们在两个大型文本集上说明了这种方法,该方法显示了优于在线LDA(HDP主题模型的有限对应项)的性能。 p> |
课程简介: | The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric model that can be used to model mixed-membership data with a potentially infinite number of components. It has been applied widely in probabilistic topic modeling, where the data are documents and the components are distributions of terms that reflect recurring patterns (or “topics”) in the collection. Given a document collection, posterior inference is used to determine the number of topics needed and to characterize their distributions. One limitation of HDP analysis is that existing posterior inference algorithms require multiple passes through all the data—these algorithms are intractable for very large scale applications. We propose an online variational inference algorithm for the HDP, an algorithm that is easily applicable to massive and streaming data. Our algorithm is significantly faster than traditional inference algorithms for the HDP, and lets us analyze much larger data sets. We illustrate the approach on two large collections of text, showing improved performance over online LDA, the finite counterpart to the HDP topic model. |
关 键 词: | 贝叶斯非参数; HDP推理算法 |
课程来源: | 视频讲座网 |
数据采集: | 2021-05-08:zyk |
最后编审: | 2021-05-08:zyk |
阅读次数: | 60 |