0


非对称标签噪声分类:一致性和最大去噪

Classification with Asymmetric Label Noise: Consistency and Maximal Denoising
课程网址: http://videolectures.net/colt2013_blanchard_noise/  
主讲教师: Gilles Blanchard
开课单位: 德国波茨坦大学
开课时间: 2013-08-09
课程语种: 英语
中文简介:
在许多实际分类问题中,训练样本的标签是随机损坏的。因此,每个类的训练示例集都被另一个类的示例所污染。以前关于这个问题的理论工作假设这两个类是可分离的,标签噪声独立于真正的类标签,或者每个类的噪声比例是已知的。我们介绍了一个通用的标签噪声分类框架,消除了这些假设。相反,我们给出了一些假设,以确保可识别性和最优风险的一致估计器的存在,以及相关的估计策略。对于任意一对污染分布,有一个唯一的非污染分布满足所提出的假设,我们认为该解在一定意义上对应于最大去噪。特别地,我们发现即使类条件分布重叠且标签噪声不对称,有标签噪声存在的学习也是可能的。我们的方法的一个关键是对另一个分布中出现的一个分布的最大比例进行普遍一致的估计,我们将这个问题称为“混合比例估计”。这项工作的动机是核粒子分类中的一个问题。
课程简介: In many real-world classification problems, the labels of training examples are randomly corrupted. Thus, the set of training examples for each class is contaminated by examples of the other class. Previous theoretical work on this problem assumes that the two classes are separable, that the label noise is independent of the true class label, or that the noise proportions for each class are known. We introduce a general framework for classification with label noise that eliminates these assumptions. Instead, we give assumptions ensuring identifiability and the existence of a consistent estimator of the optimal risk, with associated estimation strategies. For any arbitrary pair of contaminated distributions, there is a unique pair of non-contaminated distributions satisfying the proposed assumptions, and we argue that this solution corresponds in a certain sense to maximal denoising. In particular, we find that learning in the presence of label noise is possible even when the class-conditional distributions overlap and the label noise is not symmetric. A key to our approach is a universally consistent estimator of the maximal proportion of one distribution that is present in another, a problem we refer to as“mixture proportion estimation. This work is motivated by a problem in nuclear particle classification.
关 键 词: 训练样本; 训练示例集; 混合比例估计; 非对称标签
课程来源: 视频讲座网公开课
最后编审: 2019-05-26:cwx
阅读次数: 81