主题模型中的内存有界推断Memory Bounded Inference in Topic Models |
|
课程网址: | http://videolectures.net/icml08_gomes_mbi/ |
主讲教师: | Ryan Gomes |
开课单位: | 加利福尼亚理工学院 |
开课时间: | 2008-07-29 |
课程语种: | 英语 |
中文简介: | 哪些类型的算法和统计技术支持在很长一段时间内从非常大的数据集中学习?我们通过一个变分 em 算法的内存有界版本来解决这个问题, 该算法近似于主题模型的推理。该算法分为两个阶段: "模型构建" 和 "模型压缩", 以始终满足给定的内存约束。随着更多数据通过贝叶斯模型选择到达, 模型构建阶段会增加其内部表示形式 (主题数)。压缩是通过合并成块中的数据项并仅缓存其足够的统计信息来实现的。从经验上讲, 生成的算法能够处理比标准批处理版本大数量级的数据集。 |
课程简介: | What type of algorithms and statistical techniques support learning from very large datasets over long stretches of time? We address this question through a memory bounded version of a variational EM algorithm that approximates inference of a topic model. The algorithm alternates two phases: "model building" and "model compression" in order to always satisfy a given memory constraint. The model building phase grows its internal representation (the number of topics) as more data arrives through Bayesian model selection. Compression is achieved by merging data-items in clumps and only caching their sufficient statistics. Empirically, the resulting algorithm is able to handle datasets that are orders of magnitude larger than the standard batch version. |
关 键 词: | 统计技术支持学习; 贝叶斯模型选择; 变电磁算法 |
课程来源: | 视频讲座网 |
最后编审: | 2020-06-03:张荧(课程编辑志愿者) |
阅读次数: | 55 |