0


完全分布式的大型数据集

Fully Distributed EM for Very Large Datasets
课程网址: http://videolectures.net/icml08_wolfe_fdem/  
主讲教师: Jason Wolfe
开课单位: 加州大学伯克利分校
开课时间: 2008-07-30
课程语种: 英语
中文简介:
在 em 和相关算法中, e 步计算易于分布, 因为数据项是独立的给定参数。但是, 对于非常大的数据集, 即使将所有参数存储在 m 步的单个节点中也可能不切实际。我们提出了一个充分分配整个 em 程序的框架。每个节点仅与其数据相关的参数交互, 沿连接树拓扑向其他节点发送消息。我们演示了 mapreduce 方法在两个任务上的改进: 单词对齐和主题建模。
课程简介: In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be impractical. We present a framework which fully distributes the entire EM procedure. Each node interacts with only parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce approach, on two tasks: word alignment and topic modeling.
关 键 词: 机器学习; 主题建模; MapReduce
课程来源: 视频讲座网
最后编审: 2020-06-24:yumf
阅读次数: 29