0


基于鲁棒树的数据融合增量填充方法

Robust Tree-Based Incremental Imputation Method for Data Fusion
课程网址: http://videolectures.net/ida07_dambrosio_rtbi/  
主讲教师: Antonio D Ambrosio
开课单位: 那不勒斯大学
开课时间: 2007-10-08
课程语种: 英语
中文简介:
数据融合和数据移植涉及到将来自不同来源的文件和信息结合起来。问题不在于从单个数据库中提取数据,而在于合并从不同样本调查中收集的信息。典型的数据融合情况是由两个数据样本构成的,前者由相对于第一次调查的完整数据矩阵x组成,后者是包含一定数量缺失变量的y。其目的是从X获取的知识开始,完成矩阵Y,从而定义了将两个数据矩阵合并起来的相关结构。本文提出了一种基于树模型增量插补算法的数据融合方法。此外,我们通过增加迭代来考虑健壮的树验证。该方法的一个相关优点是适用于包括数值变量和分类变量的混合数据结构。作为基准方法,我们考虑了标准树和多元回归等显式方法以及基于隐式方法的主成分分析。广泛的仿真研究表明,该方法比其他方法具有更高的精度。
课程简介: Data Fusion and Data Grafting are concerned with combining files and information coming from different sources. The problem is not to extract data from a single database, but to merge information collected from different sample surveys. The typical data fusion situation formed of two data samples, the former made up of a complete data matrix X relative to a first survey, and the latter Y which contains a certain number of missing variables. The aim is to complete the matrix Y beginning from the knowledge acquired from the X. Thus, the goal is the definition of the correlation structure which joins the two data matrices to be merged. In this paper, we provide an innovative methodology for Data Fusion based on an incremental imputation algorithm in tree-based models. In addition, we consider robust tree validation by boosting iterations. A relevant advantage of the proposed method is that it works for a mixed data structure including both numerical and categorical variables. As benchmarking methods we consider explicit methods such as standard trees and multiple regression as well as an implicit method based principal component analysis. A widely extended simulation study proves that the proposed method is more accurate than the other methods.
关 键 词: 计算机科学; 机器学习; 预处理
课程来源: 视频讲座网
最后编审: 2020-09-28:heyf
阅读次数: 42