机器学习软件的实践:Quo Vadis?

Machine Learning Software in Practice: Quo Vadis?
课程网址: http://videolectures.net/kdd2017_pafka_machine_learning_software/  
主讲教师: Szilard Pafka
开课单位: Epoch公司
开课时间: 2017-10-09
课程语种: 英语
课程简介: Due to the hype in our industry in the last couple of years, there is a growing mismatch between software tools machine learning practitioners wish for, what they would truly need for their work, what's available (either commercially or open source) and what tool developers and researchers focus on. In this talk we will give a couple of examples of this mismatch. Several surveys and anecdotal evidence show that most practitioners work most of the time (at least in the modeling phase) with datasets that t in the RAM of a single server, therefore distributed computing tools are very of- ten overkill. Our benchmarks (available on github [1]) of the most widely used open source tools for binary classification (various implementations of algorithms such as linear methods, random forests, gradient boosted trees and neural networks) on such data show over 10x speed and over 10x RAM usage difference between various tools, with "big data" tools being the most inefficient. Significant performance gains have been obtained by those tools that incorporate various low-level (close to CPU and memory architecture) optimizations. Nevertheless, we will show that even the best tools show degrading performance on the multi-socket servers featuring a high number of cores, systems that have become widely accessible more recently. Finally, while most of this talk is about performance, we will also argue that machine learning tools that feature high-level easy-to-use APIs provide increasing productivity for practitioners and therefore are preferable.
关 键 词: 机器学习; 开源二进制; 软件工具
课程来源: 视频讲座网
数据采集: 2022-11-18:chenjy
最后编审: 2022-11-18:chenjy
阅读次数: 36