0


10年最佳论文:将标记和未标记数据与协同培训相结合

10 Year Best Paper: Combining Labeled and Unlabeled Data with Co-Training
课程网址: http://videolectures.net/icml08_shavlik_clud/  
主讲教师: Jude W. Shavlik
开课单位: 威斯康星大学
开课时间: 2008-07-24
课程语种: 英语
中文简介:
当只有一小组标记示例可用时,我们考虑使用大的未标记样本来提高学习算法的性能的问题。特别地,我们考虑由学习分类网页的任务所激发的问题设置,其中每个示例的描述可以被划分为两个不同的视图。例如,网页的描述可以划分为在该页面上出现的单词以及在指向该页面的超链接中出现的单词我们假设如果我们有足够的标记数据但是我们的目标,该示例的任一视图都足以用于学习是将两个视图一起使用以允许廉价的未标记数据来增加更小的标记示例集。特别地,每个示例的两个不同视图的存在提出了这样的策略,其中两个学习算法在每个视图上被单独训练,然后每个算法对新的未标记示例的预测被用于扩大另一个的训练集。我们在本文中的目标是为此设置提供PAC样式分析,更广泛地为PAC样式框架提供从标记和未标记数据中学习的一般问题。我们还提供了真实网页数据的实证结果,表明这种未标记的例子的使用可以导致实践中假设的显着改进。已经针对各种各样的学习问题开发了玻尔兹曼机器(RBM)。然而,RBM通常用作另一种学习算法的特征提取器或为深度前馈神经网络分类器提供良好的初始化,并且不被认为是分类问题的独立解决方案。在本文中,我们认为RBM提供了一个自包含的框架,用于推导竞争非线性分类器。我们对RBM的不同学习算法进行了评估,旨在为RBM培训引入一个判别性组件,并提高其作为分类器的性能。这种方法很简单,因为RBM直接用于构建分类器,而不是作为垫脚石。最后,我们演示了如何在半监督环境中成功地使用判别式RBM。
课程简介: We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular we consider a problem setting motivated by the task of learning to classify web pages in which the description of each example can be partitioned into two distinct views. For example the description of a web page can be partitioned into the words occurring on that page and the words occurring in hyperlinks that point to that page We assume that either view of the examplewould be sufficient for learning if we had enough labeled data but our goal is to use both views together to allow inexpensive unlabeled data to augment a much smaller set of labeled examples. Specically the presence of two distinct views of each example suggests strategies in which two learning algorithms are trained separately on each view and then each algorithms predictions on new unlabeled examples are used to enlarge the training set of the other. Our goal in this paper is to provide a PAC style analysis for this setting and more broadly a PAC style framework for the general problem of learning from both labeled and unlabeled data. We also provide empirical results on real web page data indicating that this use of unlabeled examples can lead to significant improvement of hypotheses in practice Boltzmann Machines (RBMs) have been developed for a large variety of learning problems. However, RBMs are usually used as feature extractors for another learning algorithm or to provide a good initialization for deep feed-forward neural network classifiers, and are not considered as a stand-alone solution to classification problems. In this paper, we argue that RBMs provide a self-contained framework for deriving competitive non-linear classifiers. We present an evaluation of different learning algorithms for RBMs which aim at introducing a discriminative component to RBM training and improve their performance as classifiers. This approach is simple in that RBMs are used directly to build a classifier, rather than as a stepping stone. Finally, we demonstrate how discriminative RBMs can also be successfully employed in a semi-supervised setting.
关 键 词: 网页; 标记数据; 非线性分类器
课程来源: 视频讲座网
最后编审: 2020-07-14:yumf
阅读次数: 59