0


蒙德里安森林:通过贝叶斯非参数数据流的有效随机森林

Mondrian forests: Efficient random forests for streaming data via Bayesian nonparametrics
课程网址: http://videolectures.net/sahd2014_teh_mondrian_forests/  
主讲教师: Yee Whye Teh
开课单位: 牛津大学
开课时间: 2014-10-29
课程语种: 英语
中文简介:
随机决策树的集合被广泛用于机器学习和统计学中的分类和回归任务。它们实现了有竞争力的预测性能,并且在训练(批量设置)和测试方面具有计算效率,这使它们成为现实世界预测任务的优秀候选。然而,最流行的变体(如Breiman的随机森林和极端随机树)仅在批处理设置中工作,无法轻松处理流式数据。在本次演讲中,我将介绍蒙德里安森林,其中随机决策树是由一个称为蒙德里安过程的贝叶斯非参数模型生成的(Roy and Teh,2009)。利用Mondrian过程的显著一致性特性,我们开发了一种可以以增量方式高效构建的极端随机树变体,从而使其在流数据上的使用简单高效。对真实世界分类任务的实验表明,蒙德里安森林的预测性能与现有在线随机森林和周期性重新训练的批量随机森林相当,同时速度快一个数量级以上,因此在计算与精度之间进行了更好的权衡。
课程简介: Ensembles of randomized decision trees are widely used for classification and regression tasks in machine learning and statistics. They achieve competitive predictive performance and are computationally efficient to train (batch setting) and test, making them excellent candidates for real world prediction tasks. However, the most popular variants (such as Breiman's random forest and extremely randomized trees) work only in the batch setting and cannot handle streaming data easily. In this talk, I will present Mondrian Forests, where random decision trees are generated from a Bayesian nonparametric model called a Mondrian process (Roy and Teh, 2009). Making use of the remarkable consistency properties of the Mondrian process, we develop a variant of extremely randomized trees that can be constructed in an incremental fashion efficiently, thus making their use on streaming data simple and efficient. Experiments on real world classification tasks demonstrate that Mondrian Forests achieve competitive predictive performance comparable with existing online random forests and periodically retrained batch random forests, while being more than an order of magnitude faster, thus representing a better computation vs accuracy tradeoff.
关 键 词: 随机决策树的集合; 机器学习; 统计学
课程来源: 视频讲座网
数据采集: 2022-03-28:zkj
最后编审: 2022-03-28:zkj
阅读次数: 75