Spark上的可扩展RScalable R on Spark |
|
课程网址: | http://videolectures.net/kdd2016_tutorial_scalable_r_on_spark/ |
主讲教师: | John-Mark Agosta; Debraj GuhaThakurta |
开课单位: | 微软 |
开课时间: | 2016-09-16 |
课程语种: | 英语 |
中文简介: | R是数据科学、统计和机器学习(ML)社区中最流行的语言之一。然而,当涉及到使用R进行可扩展的数据分析和机器学习时,许多数据科学家受到以下因素的阻碍:(a)有效处理大型数据集的可用功能的限制,以及(b)关于将R脚本从桌面探索性分析扩展到弹性和分布式云服务的适当计算环境的知识。在本教程中,我们将讨论演示分布式计算环境和r的端到端解决方案的使用的解决方案。我们将通过演示文稿和示例代码的实际示例来介绍这些主题。此外,我们将提供一个公共代码存储库,与会者将能够访问并适应他们自己的实践。我们相信本教程将会引起越来越多使用R进行数据分析和建模的数据科学家和开发人员的兴趣。 前提条件:笔记本电脑,安装浏览器和支持端口转发的ssh客户端。将提供对基于云的集群的访问。有关R脚本、下载详细信息和建议阅读,请参阅Readme。文件地址: |
课程简介: | R is one of the most popular languages in the data science, statistical and machine learning (ML) community. However, when it comes to scalable data analysis and ML using R, many data scientists are blocked or hindered by (a) its limitations of available functions to handle large data-sets efficiently, and (b) knowledge about the appropriate computing environments to scale R scripts from desktop exploratory analysis to elastic and distributed cloud services. In this tutorial we will discuss solutions that demonstrate the use of distributed compute environments and end to end solutions for R. We will present the topics through presentations and hands-on examples with sample code. In addition, we will provide a public code repository that attendees will be able to access and adapt to their own practice. We believe this tutorial will be of strong interest to a large and growing community of data scientists and developers using R for data analysis and modeling. Prerequisites: A laptop with a web browser and an ssh client that supports port forwarding. Access to cloud-based clusters will be provided. For R scripts, download details, and suggested reading, see the Readme.md file at |
关 键 词: | 数据科学; 机器学习; 数据分析; 功能限制 |
课程来源: | 视频讲座网 |
数据采集: | 2023-04-22:chenxin01 |
最后编审: | 2023-05-18:chenxin01 |
阅读次数: | 24 |