0


一种有效的无参数大规模离线学习方法

An Efficient Parameter - Free Method for Large Scale Offline Learning
课程网址: http://videolectures.net/icml08_boulle_aep/  
主讲教师: Marc Boulle
开课单位: 法国电信研究院
开课时间: 2008-09-01
课程语种: 英语
中文简介:
随着计算机存储容量的快速增长,可用数据和对评分模型的需求都呈现出增长趋势,比处理能力更加尖锐。然而,广泛的数据挖掘解决方案的主要限制是熟练数据分析师的不增加可用性,其在数据准备和模型选择中起关键作用。在本文中,我们提出了一个参数免费的可扩展分类方法,这是迈向全自动数据挖掘的一步。该方法基于贝叶斯最优单变量条件密度估计,使用贝叶斯变量选择方案增强的朴素贝叶斯分类,以及使用后验分布的对数平滑的模型平均。我们专注于算法的复杂性,并展示他们如何处理远大于可用中央内存的数据集。我们最终报告了大规模学习挑战的结果,我们的方法在可行的计算时间内获得了最先进的性能。
课程简介: With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper we present a parameter-free scalable classification method, which is a step towards fully automatic data mining. The method is based on Bayes optimal univariate conditional density estimators, naive Bayes classification enhanced with a Bayesian variable selection scheme, and averaging of models using a logarithmic smoothing of the posterior distribution. We focus on the complexity of the algorithms and show how they can cope with datasets that are far larger than the available central memory. We finally report results on the Large Scale Learning challenge, where our method obtains state of the art performance within practicable computation time.
关 键 词: 数据挖掘; 可用数据; 贝叶斯变量选择方案
课程来源: 视频讲座网
最后编审: 2019-04-17:lxf
阅读次数: 48