0


通过近似误差减少的可扩展主动学习

Scalable Active Learning by Approximated Error Reduction
课程网址: http://videolectures.net/kdd2018_fu_scalable_approximated/  
主讲教师: Weijie Fu
开课单位: 合肥工业大学
开课时间: 2018-11-23
课程语种: 英语
中文简介:
我们研究了大规模数据集上多类分类的主动学习问题。在这种情况下,基于不确定性度量的现有主动学习方法对于发现未知区域无效,而基于预期误差减少的方法由于其巨大的时间成本而无效。为了克服上述问题,本文提出了一种新的查询选择准则,称为近似错误减少(AER)。在AER中,基于对所有数据点的预期影响以及误差减少与对其附近数据点的影响之间的近似比率来估计每个候选的误差减少。特别地,我们利用分层锚图来构造候选集以及这些候选的附近数据点集。该策略的好处在于,它能够随着标签的增加而实现候选的分层扩展,并允许我们进一步加快AER估计。我们最后将AER引入到一个用于可扩展主动学习的高效半监督分类器中。对大小从数千到数百万不等的公开可用数据集的实验证明了我们方法的有效性。
课程简介: We study the problem of active learning for multi-class classification on large-scale datasets. In this setting, the existing active learning approaches built upon uncertainty measures are ineffective for discovering unknown regions, and those based on expected error reduction are inefficient owing to their huge time costs. To overcome the above issues, this paper proposes a novel query selection criterion called approximated error reduction (AER). In AER, the error reduction of each candidate is estimated based on an expected impact over all datapoints and an approximated ratio between the error reduction and the impact over its nearby datapoints. In particular, we utilize hierarchical anchor graphs to construct the candidate set as well as the nearby datapoint sets of these candidates. The benefit of this strategy is that it enables a hierarchical expansion of candidates with the increase of labels, and allows us to further accelerate the AER estimation. We finally introduce AER into an efficient semi-supervised classifier for scalable active learning. Experiments on publicly available datasets with the sizes varying from thousands to millions demonstrate the effectiveness of our approach.
关 键 词: 大规模数据集; 多类分类的主动学习问题; 近似错误减少; 公开可用数据集
课程来源: 视频讲座网
数据采集: 2023-01-30:cyh
最后编审: 2023-01-30:cyh
阅读次数: 21