0


核回归的核心集

Coresets for Kernel Regression
课程网址: http://videolectures.net/kdd2017_zheng_kernel_regression/  
主讲教师: Yan Zheng
开课单位: 犹他大学
开课时间: 2017-10-09
课程语种: 英语
中文简介:
核回归是非参数数据分析中必不可少且普遍存在的工具,在时间序列和空间数据中尤其流行。然而,多次执行的中央操作(评估数据集上的内核)需要线性时间。这对于现代大数据集来说是不切实际的。在本文中,我们描述了用于内核回归的核心集:压缩数据集,可以用作原始数据的代理,并且具有可证明的有界最坏情况误差。核心集的大小与数据点的原始数量无关,而是仅取决于误差保证,在某些情况下还取决于域的大小和平滑量。我们在非常大的时间序列和空间数据上评估我们的方法,并证明它们产生的误差可以忽略不计,可以非常有效地构建,并且可以带来巨大的计算收益。
课程简介: Kernel regression is an essential and ubiquitous tool for non-parametric data analysis, particularly popular among time series and spatial data. However, the central operation which is performed many times, evaluating a kernel on the data set, takes linear time. This is impractical for modern large data sets. In this paper we describe coresets for kernel regression: compressed data sets which can be used as proxy for the original data and have provably bounded worst case error. The size of the coresets are independent of the raw number of data points, rather they only depend on the error guarantee, and in some cases the size of domain and amount of smoothing. We evaluate our methods on very large time series and spatial data, and demonstrate that they incur negligible error, can be constructed extremely efficiently, and allow for great computational gains.
关 键 词: 核回归; 中央操作; 压缩数据集
课程来源: 视频讲座网
数据采集: 2023-12-25:wujk
最后编审: 2023-12-25:wujk
阅读次数: 15