首页机械学
   首页人工智能
0


较大的残差较少的工作:活动文档调度隐含狄利克雷分布

Larger Residuals Less Work: Active Document Scheduling for Latent Dirichlet Allocation
课程网址: http://videolectures.net/ecmlpkdd2011_wahabzada_allocation/  
主讲教师: Mirwaes Wahabzada
开课单位: 弗劳恩霍夫协会
开课时间: 2011-11-30
课程语种: 英语
中文简介:
近年来, 在潜在 dirichlet 分配 (lda) 的快速推理方面取得了相当大的进展。特别是, 具有自然梯度步长的变分贝叶斯 (vb) 目标函数的随机优化被证明是收敛的, 能够处理大量的文档收集。为了降低梯度估计中的噪声, 它考虑了随机选择的多个文档。虽然人们普遍认识到, 随机优化中的文件调度可能会产生重大影响, 但这一问题在很大程度上仍未得到探讨。在这项工作中, 我们讨论了这一问题。具体而言, 我们提出剩余 lda, 这是一种新颖、易于实现的 lda 方法, 它以明智的方式安排文档。从直觉上看, 在每次迭代中, 剩余 lda 都会主动选择对当前剩余位值产生过大影响的文档来计算下一次更新。在几个真实的数据集中, 包括维基百科的3m 文章, 我们证明了剩余 lda 可以轻松地分析大量的文档集合, 并发现主题模型比批量 vb 和随机计划的 vb 的主题模型好或更好,速度更快。
课程简介: Recently, there have been considerable advances in fast inference for latent Dirichlet allocation (LDA). In particular, stochastic optimization of the variational Bayes (VB) objective function with a natural gradient step was proved to converge and able to process massive document collections. To reduce noise in the gradient estimation, it considers multiple documents chosen uniformly at random. While it is widely recognized that the scheduling of documents in stochastic optimization may have significant consequences, this issue remains largely unexplored. In this work, we address this issue. Specifically, we propose residual LDA, a novel, easy-to-implement, LDA approach that schedules documents in an informed way. Intuitively, in each iteration, residual LDA actively selects documents that exert a disproportionately large influence on the current residual to compute the next update. On several real-world datasets, including 3M articles from Wikipedia, we demonstrate that residual LDA can handily analyze massive document collections and find topic models as good or better than those found with batch VB and randomly scheduled VB, and significantly faster.
关 键 词: 计算机科学; 机器学习; 活动文档
课程来源: 视频讲座网
最后编审: 2020-06-24:yumf
阅读次数: 77