0


分布式计算机器语言:一个关于并行复用分析快速实现的基础设施

Hadoop-ML: An Infrastructure for the Rapid Implementation of Parallel Reusable Analytics
课程网址: http://videolectures.net/nipsworkshops09_pednault_hmli/  
主讲教师: Edwin Pednault
开课单位: 沃森研究中心
开课时间: 2010-01-19
课程语种: 英语
中文简介:
Hadoop是谷歌地图缩减编程模型的开源实现。在过去的几年里,它已经发展成为一个在工业界和学术界流行的并行化平台。此外,趋势表明Hadoop很可能是即将推出的基于云的系统的分析平台。不幸的是,在Hadoop上实现并行机器学习/数据挖掘(ML/DM)算法既复杂又耗时。为了解决这一挑战,我们提出了HadoopML,这是一个基础设施,以便于在Hadoop上实现并行的ML/DM算法。Hadoop ML的设计允许任务并行和数据并行ML/DM算法的规范。此外,它支持使用串行和并行构建块的并行ML/DM算法的组合——这允许编写可重用的并行代码。所提议的抽象通过要求用户只指定计算及其依赖性来简化实现过程,而不必担心调度、数据管理和通信。因此,这些代码是可移植的,因为用户不需要编写Hadoop特定的代码。这有可能使您可以利用未来的并行化平台而不必重写代码。
课程简介: Hadoop is an open-source implementation of Google's Map-Reduce programming model. Over the past few years, it has evolved into a popular platform for parallelization in industry and academia. Furthermore, trends suggest that Hadoop will likely be the analytics platform of choice on forthcoming Cloud-based systems. Unfortunately, implementing parallel machine learning/data mining (ML/DM) algorithms on Hadoop is complex and time consuming. To address this challenge, we present Hadoop-ML, an infrastructure to facilitate the implementation of parallel ML/DM algorithms on Hadoop. Hadoop-ML has been designed to allow for the specification of both task-parallel and data-parallel ML/DM algorithms. Furthermore, it supports the composition of parallel ML/DM algorithms using both serial as well as parallel building blocks -- this allows one to write reusable parallel code. The proposed abstraction eases the implementation process by requiring the user to only specify computations and their dependencies, without worrying about scheduling, data management, and communication. As a consequence, the codes are portable in that the user never needs to write Hadoop-specific code. This potentially allows one to leverage future parallelization platforms without rewriting one's code.
关 键 词: 编程模型; 并行化平台; 云计算; 并行机器学习; 基础设施
课程来源: 视频讲座网
最后编审: 2020-06-03:wuyq
阅读次数: 44