0


进化数据流的新集成方法

New Ensemble Methods for Evolving Data Streams
课程网址: http://videolectures.net/kdd09_bifet_nemeds/  
主讲教师: Albert Bifet
开课单位: Telecom ParisTech
开课时间: 2009-09-14
课程语种: 英语
中文简介:
随着要求数据流处理的应用程序数量的增加,数据流的高级分析正迅速成为数据挖掘研究的一个关键领域。当这些数据流随时间演变时,即当概念漂移或完全改变时,在线挖掘正成为核心问题之一。在处理非平稳概念时,分类器集合比单个分类器方法有几个优点:它们易于缩放和并行化,可以通过在集合的执行部分进行修剪来快速适应变化,因此它们通常也生成更准确的概念描述。 本文提出了一种新的用于研究概念漂移的实验数据流框架,以及两种新的Bagging变体:ADWIN Bagging和自适应大小Hoeffding树(ASHT)Bagging。使用新的实验框架,对合成和真实世界数据集(包括多达1000万个示例)的评估研究表明,与几种已知方法相比,新的集成方法表现非常好。
课程简介: Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is when concepts drift or change completely, is becoming one of the core issues. When tackling non-stationary concepts, ensembles of classifiers have several advantages over single classifier methods: they are easy to scale and parallelize, they can adapt to change quickly by pruning under-performing parts of the ensemble, and they therefore usually also generate more accurate concept descriptions. This paper proposes a new experimental data stream framework for studying concept drift, and two new variants of Bagging: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. Using the new experimental framework, an evaluation study on synthetic and real-world datasets comprising up to ten million examples shows that the new ensemble methods perform very well compared to several known methods.
关 键 词: 数据流处理; 高级分析; 实验框架
课程来源: 视频讲座网
数据采集: 2023-03-07:chenjy
最后编审: 2023-03-07:chenjy
阅读次数: 26