0


从候选标签集学习

Learning from Candidate Labeling Sets
课程网址: http://videolectures.net/nips2010_luo_lcls/  
主讲教师: Jie Luo
开课单位: IDIAP研究所
开课时间: 2011-03-25
课程语种: 英语
中文简介:
在许多实际应用程序中,我们不能访问完全标记的培训数据,只能访问可能的标签列表。例如,当从网络下载的图像中学习视觉分类器时,只使用它们的文本标题或标记作为学习神谕。一般来说,这些问题可能非常困难。然而,大多数情况下,存在不同的隐式信息源,它们来自于实例和标签之间的关系,而这些关系通常被忽略。本文提出了一种半监督框架来模拟这类问题。每个训练样本是一个包含多个实例的包,与一组候选标记向量相关联。每个标记向量对袋子中实例的可能标签进行编码,只有一个是完全正确的。标记向量的使用提供了一种不排除任何信息的原则方法。提出了一种大余量判别公式,并给出了求解该公式的有效算法。在人工数据集和真实图像及字幕数据集上进行的实验表明,我们的方法能够达到与使用ground-truth标签训练的SVM相当的性能,并且优于其他基线。
课程简介: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to SVM trained with the ground-truth labels, and outperforms other baselines.
关 键 词: 标签列表; 文本标题; 半监督框架
课程来源: 视频讲座网
最后编审: 2019-11-01:lxf
阅读次数: 31