0


蒙德里安森林:基于贝叶斯非参数流数据的有效随机森林

Mondrian forests: Efficient random forests for streaming data via Baye nonparametrics
课程网址: http://videolectures.net/sahd2014_teh_mondrian_forests/  
主讲教师: Yee Whye Teh
开课单位: 牛津大学
开课时间: 2014-10-29
课程语种: 英语
中文简介:

随机决策树的集合被广泛用于机器学习和统计中的分类和回归任务。它们具有竞争性的预测性能,并且在训练(批量设置)和测试方面计算效率高,使其成为现实世界中预测任务的理想选择。但是,最流行的变体(例如Breiman的随机林和极端随机的树)仅在批处理设置中起作用,并且无法轻松处理流数据。在本次演讲中,我将介绍蒙德里安森林,其中随机决策树是根据称为蒙德里安过程的贝叶斯非参数模型生成的(Roy and Teh,2009)。利用Mondrian流程的出色一致性属性,我们开发了一种极为随机的树的变体,可以高效地以增量方式构造它们,从而使它们在流数据上的使用变得简单而高效。现实世界中分类任务的实验表明,蒙德里安森林具有与现有的在线随机森林和定期重新训练的批处理随机森林相当的竞争性预测性能,但速度快了一个数量级,因此代表了更好的计算与准确性的权衡。

课程简介: Ensembles of randomized decision trees are widely used for classification and regression tasks in machine learning and statistics. They achieve competitive predictive performance and are computationally efficient to train (batch setting) and test, making them excellent candidates for real world prediction tasks. However, the most popular variants (such as Breiman's random forest and extremely randomized trees) work only in the batch setting and cannot handle streaming data easily. In this talk, I will present Mondrian Forests, where random decision trees are generated from a Bayesian nonparametric model called a Mondrian process (Roy and Teh, 2009). Making use of the remarkable consistency properties of the Mondrian process, we develop a variant of extremely randomized trees that can be constructed in an incremental fashion efficiently, thus making their use on streaming data simple and efficient. Experiments on real world classification tasks demonstrate that Mondrian Forests achieve competitive predictive performance comparable with existing online random forests and periodically retrained batch random forests, while being more than an order of magnitude faster, thus representing a better computation vs accuracy tradeoff.
关 键 词: 贝叶斯; 随机决策树
课程来源: 视频讲座网
数据采集: 2020-10-29:zyk
最后编审: 2020-10-29:zyk
阅读次数: 174