完全分布式的大型数据集Fully Distributed EM for Very Large Datasets |
|
课程网址: | http://videolectures.net/icml08_wolfe_fdem/ |
主讲教师: | Jason Wolfe |
开课单位: | 加州大学伯克利分校 |
开课时间: | 2008-07-30 |
课程语种: | 英语 |
中文简介: | 在 em 和相关算法中, e 步计算易于分布, 因为数据项是独立的给定参数。但是, 对于非常大的数据集, 即使将所有参数存储在 m 步的单个节点中也可能不切实际。我们提出了一个充分分配整个 em 程序的框架。每个节点仅与其数据相关的参数交互, 沿连接树拓扑向其他节点发送消息。我们演示了 mapreduce 方法在两个任务上的改进: 单词对齐和主题建模。 |
课程简介: | In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be impractical. We present a framework which fully distributes the entire EM procedure. Each node interacts with only parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce approach, on two tasks: word alignment and topic modeling. |
关 键 词: | 机器学习; 主题建模; MapReduce |
课程来源: | 视频讲座网 |
最后编审: | 2020-06-24:yumf |
阅读次数: | 29 |