0


降低主题模型的抽样复杂度

Reducing the Sampling Complexity of Topic Models
课程网址: http://videolectures.net/kdd2014_li_sampling_complexity/  
主讲教师: Aaron Li
开课单位: 卡内基梅隆大学
开课时间: 2014-10-07
课程语种: 英语
中文简介:

主题模型中的推论通常涉及将潜在变量与观察结果相关联的采样步骤。不幸的是,随着数据量的增加,生成模型失去了稀疏性,对于k个主题,每个单词需要进行O(k)个运算。在本文中,我们提出了一种算法,该算法与文档中实际实例化的主题数kd成线性比例。对于大型文档集合和结构化层次模型kd ll k。这产生了数量级的加速。我们的方法适用于各种统计模型,例如PDP [16,4]和HDP [19]。

其核心思想是,通过组合可以有效地近似密集,缓慢变化的分布Metropolis Hastings步骤的步骤,稀疏性的使用以及通过Walker别名方法摊销的恒定时间采样。

课程简介: Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately the generative model loses sparsity as the amount of data increases, requiring O(k) operations per word for k topics. In this paper we propose an algorithm which scales linearly with the number of actually instantiated topics kd in the document. For large document collections and in structured hierarchical models kd ll k. This yields an order of magnitude speedup. Our method applies to a wide variety of statistical models such as PDP [16,4] and HDP [19]. At its core is the idea that dense, slowly changing distributions can be approximated efficiently by the combination of a Metropolis-Hastings step, use of sparsity, and amortized constant time sampling via Walker's alias method.
关 键 词: 潜在变量; 主题模型推论; 结构化层次模型
课程来源: 视频讲座网
数据采集: 2021-05-27:zyk
最后编审: 2021-05-27:zyk
阅读次数: 34