0


通过核转移变量均值匹配

Covariate Shift by Kernel Mean Matching
课程网址: http://videolectures.net/nipsworkshops09_gretton_cskm/  
主讲教师: Arthur Gretton
开课单位: 伦敦大学学院
开课时间: 2010-01-19
课程语种: 英语
中文简介:
给出了训练和测试数据的一组观测值,考虑了训练数据重新加权的问题,使其分布更接近于测试数据的分布。我们通过在高维特征空间(特别是复制的内核希尔伯特空间)中匹配训练集和测试集之间的协变量分布来实现这一目标。这种方法不需要进行分布估计。相反,样本权重是通过一个简单的二次规划过程获得的。我们首先描述如何将分布映射到复制的核希尔伯特空间。接下来,我们回顾这些映射之间的距离,并描述特征空间映射是内射的条件(因此,分布具有唯一映射)。最后,我们演示了如何通过重新加权训练点来获得转移学习算法,使其特征均值与(未标记)测试分布的特征均值相匹配。当学习算法返回比数据可能建议的更简单的分类器/回归器时,我们的校正过程会产生最大和最一致的优势。另一方面,如果有足够强大的分类器(如果有的话),即使是理想的样本重新加权也可能没有实际的好处。
课程简介: Given sets of observations of training and test data, we consider the problem of re-weighting the training data such that its distribution more closely matches that of the test data. We achieve this goal by matching covariate distributions between training and test sets in a high dimensional feature space (specifically, a reproducing kernel Hilbert space). This approach does not require distribution estimation. Instead, the sample weights are obtained by a simple quadratic programming procedure. We first describe how distributions may be mapped to reproducing kernel Hilbert spaces. Next, we review distances between such mappings, and describe conditions under which the feature space mappings are injective (and thus, distributions have a unique mapping). Finally, We demonstrate how a transfer learning algorithm can be obtained by reweighting the training points such that their feature mean matches that of the (unlabeled) test distribution. Our correction procedure yields its greatest and most consistent advantages when the learning algorithm returns a classifier/regressor that is "simpler" than the data might suggest. On the other hand, even an ideal sample reweighting may not be of practical benefit given a sufficiently powerful classifier (if available).
关 键 词: 协变量分布; 二次规划程序; 样本权重调整
课程来源: 视频讲座网
最后编审: 2020-05-31:王勇彬(课程编辑志愿者)
阅读次数: 410