完全分布式的大型数据集][Fully Distributed EM for Very Large Datasets]

完全分布式的大型数据集 Fully Distributed EM for Very Large Datasets


课程网址:	http://videolectures.net/icml08_wolfe_fdem/
主讲教师:	Jason Wolfe
开课单位:	加州大学伯克利分校
开课时间:	2008-07-30
课程语种:	英语
中文简介:	在 em 和相关算法中, e 步计算易于分布, 因为数据项是独立的给定参数。但是, 对于非常大的数据集, 即使将所有参数存储在 m 步的单个节点中也可能不切实际。我们提出了一个充分分配整个 em 程序的框架。每个节点仅与其数据相关的参数交互, 沿连接树拓扑向其他节点发送消息。我们演示了 mapreduce 方法在两个任务上的改进: 单词对齐和主题建模。
课程简介:	In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be impractical. We present a framework which fully distributes the entire EM procedure. Each node interacts with only parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce approach, on two tasks: word alignment and topic modeling.
关键词:	机器学习; 主题建模; MapReduce
课程来源:	视频讲座网
最后编审:	2020-06-24：yumf
阅读次数:	69

境外开放课程导航