0


解决数据科学的挑战:规模、技能和复杂性

Addressing Challenges in Data Science: Scale, Skill Sets and Complexity
课程网址: http://videolectures.net/kdd2019_bradley_addressing_challenges/  
主讲教师: Joseph K. Bradley
开课单位: Databricks公司
开课时间: 2020-03-02
课程语种: 英语
中文简介:

现代应用程序中的数据科学正在推动工具和组织的局限性。数据的规模,所需技能的广度以及工作流程的复杂性都使组织在开发以数据为动力的应用程序并将其移至生产环境时陷入困境。本讲座将讨论这些挑战以及Databricks在Apache Spark和MLflow等开源软件项目中克服这些挑战的努力。 Apache Spark简化了大规模ETL和分析,其Project Hydrogen帮助缩小了Spark和ML工具(例如TensorFlow和Horovod)之间的鸿沟。 MLflow是用于管理ML生命周期的开源平台,可促进实验,可重复性和部署。我们将通过对这些项目的合作以及Databricks在促进各种组织和应用程序中的数据科学方面的见解来提供见解。

课程简介: Data science in modern applications is pushing the limits of tools and organizations. The scale of data, the breadth of required skill sets, and the complexity of workflows all cause organizations to stumble when developing data-powered applications and moving them to production. This talk will discuss these challenges and Databricks’ efforts to overcome them within open source software projects like Apache Spark and MLflow. Apache Spark has simplified large-scale ETL and analytics, and its Project Hydrogen helps to bridge the gap between Spark and ML tools such as TensorFlow and Horovod. MLflow, an open source platform for managing ML lifecycles, facilitates experimentation, reproducibility and deployment. We will present insights from our collaborations on these projects, as well as our perspective at Databricks in facilitating data science for a wide variety of organizations and applications.
关 键 词: 数据科学; 开源软件; Databricks
课程来源: 视频讲座网
数据采集: 2020-04-29:zhouxj
最后编审: 2020-05-25:cxin
阅读次数: 37