流学习算法评价中的几个问题Issues in Evaluation of Stream Learning Algorithms |
|
课程网址: | http://videolectures.net/kdd09_gama_iesla/ |
主讲教师: | Joao Gama |
开课单位: | 波尔图大学 |
开课时间: | 2009-09-14 |
课程语种: | 英语 |
中文简介: | 从数据流中学习是一个日益重要的研究领域。如今,已经开发了几种流学习算法。他们中的大多数学习决策模型,这些模型随着时间的推移不断发展,在资源感知环境中运行,检测并响应环境生成数据的变化。一个尚未方便解决的重要问题是设计用于评估和比较随时间演变的决策模型的实验工作。评估非静止环境中的性能没有黄金标准。本文提出了评估预测流学习算法的一般框架。我们捍卫使用Predictive Sequential方法来估计前序错误。前序错误使我们能够监控随时间演变的模型性能的演变。然而,与保持估计相比,已知它是一个悲观的估计。为了获得更可靠的估计,我们需要一些遗忘机制。两种可行的替代方案是:滑动窗口和衰落因子。我们观察到,当在滑动窗口上估计或使用衰落因子时,前向误差会收敛到保持估计器。 %类似的观察结果适用于衰落因子。我们提出了使用衰落因子使用前序误差估计器的说明性例子,用于以下任务:i。评估学习算法的性能; II。比较学习算法; III。使用McNemar检验进行假设检验;和iv。使用Page Hinkley测试进行变化检测。在这些任务中,使用衰落因子估计的前序误差提供了可靠的估计。与滑动窗口相比,衰落因子更快且内存更少,这是流应用程序的要求。本文是在学习随时间演变的动态模型时对绩效评估良好实践的讨论做出的贡献。 |
课程简介: | Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resource-aware environments, detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. There are no golden standards for assessing performance in non-stationary environments. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of Predictive Sequential methods for error estimate -- the prequential error. The prequential error allows us to monitor the evolution of the performance of models that evolve over time. Nevertheless, it is known to be a pessimistic estimator in comparison to holdout estimates. To obtain more reliable estimators we need some forgetting mechanism. Two viable alternatives are: sliding windows and fading factors. We observe that the prequential error converges to an holdout estimator when estimated over a sliding window or using fading factors. %A similar observation applies for fading factors. We present illustrative examples of the use of prequential error estimators, using fading factors, for the tasks of: i. assessing performance of a learning algorithm; ii. comparing learning algorithms; iii. hypothesis testing using McNemar test; and iv. change detection using Page-Hinkley test. In these tasks, the prequential error estimated using fading factors provide reliable estimators. In comparison to sliding windows, fading factors are faster and memory-less, a requirement for streaming applications. This paper is a contribution to a discussion in the good-practices on performance assessment when learning dynamic models that evolve over time. |
关 键 词: | 数据流; 决策模型; 估非静止环境 |
课程来源: | 视频讲座网 |
最后编审: | 2019-05-10:lxf |
阅读次数: | 75 |