0


稀疏逆高斯过程回归及其在气候网络发现中的应用

Sparse Inverse Gaussian Process Regression with Application to Climate Network Discovery
课程网址: http://videolectures.net/cidu2011_das_regression/  
主讲教师: Kamalika Das
开课单位: 美国宇航局
开课时间: 2012-06-27
课程语种: 英语
中文简介:
大规模数据集的回归问题在包括互联网,地球和空间科学以及财务在内的许多应用领域中无处不在。是一种流行的技术,用于在权重向量具有高斯先验的假设下对一组变量的输入输出关系进行建模。然而,将高斯过程回归应用于大数据集是具有挑战性的,因为基于所学习的模型的预测需要对n阶核矩阵进行反演。已经针对稀疏问题提出了稀疏高斯过程的近似解。但是,在几乎所有情况下,这些解决方案技术对输入域都是不可知的,并且不保留数据中的相似性结构。因此,虽然这些解决方案有时提供极好的准确性,但模型没有可解释性。这种可解释的稀疏模式对于许多应用来说非常重要。我们提出了一种稀疏高斯过程回归的新技术,它允许我们计算简约模型,同时保留数据中稀疏结构的可解释性。我们讨论了高斯过程预测中使用的逆核矩阵如何提供有价值的域信息,然后调整高斯图形模型的逆协方差估计来估计高斯核。我们使用适合于并行计算的乘法器的交替方向方法来解决优化问题。我们在气候数据集的准确性,可扩展性和可解释性方面展示了我们的方法的性能。
课程简介: Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. Gaussian Process regression is a popular technique for modeling the input-output relations of a set of variables under the assumption that the weight vector has a Gaussian prior. However, it is challenging to apply Gaussian Process regression to large data sets since prediction based on the learned model requires inversion of an order n kernel matrix. Approximate solutions for sparse Gaussian Processes have been proposed for sparse problems. However, in almost all cases, these solution techniques are agnostic to the input domain and do not preserve the similarity structure in the data. As a result, although these solutions sometimes provide excellent accuracy, the models do not have interpretability. Such interpretable sparsity patterns are very important for many applications. We propose a new technique for sparse Gaussian Process regression that allows us to compute a parsimonious model while preserving the interpretability of the sparsity structure in the data. We discuss how the inverse kernel matrix used in Gaussian Process prediction gives valuable domain information and then adapt the inverse covariance estimation from Gaussian graphical models to estimate the Gaussian kernel. We solve the optimization problem using the alternating direction method of multipliers that is amenable to parallel computation. We demonstrate the performance of our method in terms of accuracy, scalability and interpretability on a climate data set.
关 键 词: 大规模数据集; 高斯过程回归; 稀疏问题; 气候数据集; 计算机科学
课程来源: 视频讲座网
最后编审: 2020-09-17:chenxin
阅读次数: 42