0


用于大规模半监督学习的原型向量机

Prototype Vector Machine for Large Scale Semi-Supervised Learning
课程网址: http://videolectures.net/icml09_zhang_pvmlsssl/  
主讲教师: Kai Zhang
开课单位: 坦普尔大学
开课时间: 2009-08-26
课程语种: 英语
中文简介:
实际数据分析和挖掘很少完全属于监督学习场景。相反,来自各种科学领域的未标记数据的增长对大规模半监督学习(SSL)构成了巨大挑战。我们注意到,基于图形的SSL的计算密集度主要来自流形或图形正则化,这可能反过来导致难以处理的大型模型。为了缓解这种情况,我们提出了原型向量机(PVM),一种用于大规模SSL的高度可扩展,基于图的算法。我们的关键创新是使用“原型矢量”在基于图形的正则化器和模型表示上进行有效的近似。原型的选择基于两个重要标准:它们不仅在核矩阵上执行有效的低秩近似,而且还跨越与完整模型相比具有最小信息损失的模型。这些标准导致一致的原型选择方案,允许我们设计一个单一的算法(PVM),该算法表现出令人鼓舞的性能,同时拥有吸引人的缩放属性(与样本大小在经验上线性)。
课程简介: Practical data analysis and mining rarely falls exactly into the supervised learning scenario. Rather, the growing amount of unlabelled data from various scientific domains poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computational intensiveness of graph-based SSL arises largely from the manifold or graph regularization, which may in turn lead to large models that are difficult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highly scalable, graph-based algorithm for large-scale SSL. Our key innovation is the use of “prototypes vectors” for efficient approximation on both the graph-based regularizer and the model representation. The choice of prototypes are grounded upon two important criterion: they not only perform effective low- rank approximation on the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. These criterion lead to consistent prototype selection scheme, allowing us to design a unified algorithm (PVM) that demonstrates encouraging performance while at the same time possessing appealing scaling properties (empirically linear with sample size).
关 键 词: 数据分析; 监督学习; 计算密集度
课程来源: 视频讲座网
最后编审: 2019-04-25:cwx
阅读次数: 34