0


用图形为基础的指标来使经验风险最小化以加快网络化数据的主动学习

Using Graph-based Metrics with Empirical Risk Minimization to Speed Up Active Learning on Networked Data
课程网址: http://videolectures.net/kdd09_macskassy_ugbmermsualnd/  
主讲教师: Sofus A Macskassy
开课单位: 脸书公司
开课时间: 2009-09-14
课程语种: 英语
中文简介:
主动和半监督学习是标记数据稀缺时的重要技术。最近提出了一种结合主动学习和使用高斯场和谐波函数的半监督学习算法的方法。这个分类器本质上是关系的:它依赖于将数据表示为部分标记的图(也称为网络内学习问题)。这项工作再次表明,经验风险最小化(erm)是找到下一个标签实例的最佳方法,并为用半监督分类器计算erm提供了一种有效的方法。ERM的计算问题在于它依赖于计算所有可能实例的风险。如果我们能限制应试者的数量,那么我们就可以大大加快主动学习的速度。在数据本质上是图形化的情况下,我们可以利用图形结构快速识别可能是很好标签候选的实例。本文描述了一种新的混合方法,即利用社区发现和社会网络分析中心性措施来识别好的标签候选对象,然后利用ERM在该候选集合中找到最佳实例。我们在实际数据中表明,我们可以将erm计算限制在具有可比性能的实例的一小部分。
课程简介: Active and semi-supervised learning are important techniques when labeled data are scarce. Recently a method was suggested for combining active learning with a semi-supervised learning algorithm that uses Gaussian fields and harmonic functions. This classifier is relational in nature: it relies on having the data presented as a partially labeled graph (also known as a within-network learning problem). This work showed yet again that empirical risk minimization (ERM) was the best method to find the next instance to label and provided an efficient way to compute ERM with the semi-supervised classifier. The computational problem with ERM is that it relies on computing the risk for all possible instances. If we could limit the candidates that should be investigated, then we can speed up active learning considerably. In the case where the data is graphical in nature, we can leverage the graph structure to rapidly identify instances that are likely to be good candidates for labeling. This paper describes a novel hybrid approach of using of community finding and social network analytic centrality measures to identify good candidates for labeling and then using ERM to find the best instance in this candidate set. We show on real-world data that we can limit the ERM computations to a fraction of instances with comparable performance.
关 键 词: 半监督学习; 谐波函数; 经验风险最小化; 风险计算
课程来源: 视频讲座网
最后编审: 2020-01-13:chenxin
阅读次数: 47