0


无附加标记数据的决策边界学习动力学

Learning Dynamics of Decision Boundaries without Additional Labeled Data
课程网址: http://videolectures.net/kdd2018_kumagai_dynamics_decision/  
主讲教师: Atsutoshi Kumagai
开课单位: 日本电报电话公司(NTT)
开课时间: 2018-11-23
课程语种: 英语
中文简介:
我们提出了一种学习决策边界动态的方法,以在没有附加标记数据的情况下保持分类性能。在各种应用中,如垃圾邮件分类,决策边界会随着时间动态变化。因此,除非使用附加的标记数据重新训练分类器,否则分类器的性能会迅速恶化。然而,持续准备这些数据非常昂贵或不可能。所提出的方法通过使用易于准备的新获得的未标记数据以及预先收集的标记数据来减轻性能的这种恶化。利用所提出的方法,决策边界的动力学由高斯过程建模。为了从未标记的数据中利用关于决策边界的信息,使用所提出的方法假设了低密度分离标准,即决策边界不应穿过高密度区域,而是位于低密度区域。我们通过在通用正则化贝叶斯框架的基础上将熵后验正则化引入分类器参数的后验,以原则性的方式将该准则纳入我们的框架。我们开发了一种基于变分贝叶斯推理的模型有效推理算法。通过使用两个合成数据集和四个真实世界数据集的实验,证明了所提出方法的有效性。
课程简介: We propose a method for learning the dynamics of the decision boundary to maintain classification performance without additional labeled data. In various applications, such as spam-mail classification, the decision boundary dynamically changes over time. Accordingly, the performance of classifiers deteriorates quickly unless the classifiers are retrained using additional labeled data. However, continuously preparing such data is quite expensive or impossible. The proposed method alleviates this deterioration in performance by using newly obtained unlabeled data, which are easy to prepare, as well as labeled data collected beforehand. With the proposed method, the dynamics of the decision boundary is modeled by Gaussian processes. To exploit information on the decision boundaries from unlabeled data, the low-density separation criterion, i.e., the decision boundary should not cross high-density regions, but instead lie in low-density regions, is assumed with the proposed method. We incorporate this criterion into our framework in a principled manner by introducing the entropy posterior regularization to the posterior of the classifier parameters on the basis of the generic regularized Bayesian framework. We developed an efficient inference algorithm for the model based on variational Bayesian inference. The effectiveness of the proposed method was demonstrated through experiments using two synthetic and four real-world data sets.
关 键 词: 学习决策边界动态的方法; 垃圾邮件分类; 穿过高密度区域; 变分贝叶斯推理
课程来源: 视频讲座网
数据采集: 2023-01-30:cyh
最后编审: 2023-01-30:cyh
阅读次数: 8