0


从web数据学习在视频中的视觉事件识别

Visual Event Recognition in Videos by Learning from Web Data
课程网址: http://videolectures.net/cvpr2010_duan_verv/  
主讲教师: Lixin Duan
开课单位: 南洋理工大学
开课时间: 2010-07-19
课程语种: 英语
中文简介:
我们通过利用大量带有标签的网络视频(例如,来自YouTube)为消费者域视频提出视觉事件识别框架。首先,我们提出了一种新的对齐时空金字塔匹配方法来测量两个视频片段之间的距离,其中每个视频片段被分成多个级别的空间时间体积。我们计算任意两个体积之间的成对距离,并进一步整合来自不同体积的信息与整数流动地球移动距离(EMD)以明确对齐体积。其次,我们提出了一种新的跨域学习方法,1)融合来自多个金字塔等级和特征的信息(即空间时间特征和静态SIFT特征)和2)处理来自两个域的视频之间的特征分布的相当大的变化(即,web域和消费者域名)。对于每个金字塔等级和一个局部特征类型,我们基于两个域的组合训练集训练一组SVM分类器,使用不同核心类型和参数的多个基础核,这些核被融合到相同的权重以获得平均分类器。最后,我们提出了一种跨域学习方法,称为自适应多核学习(A MKL),通过最小化结构风险功能和来自两个域的数据分布之间的不匹配来学习基于多个基础核和预先计算的平均分类器的自适应分类器。实验证明了我们提出的框架的有效性,该框架通过利用网络数据仅需要少量标记的消费者视频。
课程简介: We propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new aligned space-time pyramid matching method to measure the distances between two video clips, where each video clip is divided into space-time volumes over multiple levels. We calculate the pair-wise distances between any two volumes and further integrate the information from different volumes with Integer-flow Earth Mover’s Distance (EMD) to explicitly align the volumes. Second, we propose a new cross-domain learning method in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time feature and static SIFT feature) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web domain and consumer domain). For each pyramid level and each type of local features, we train a set of SVM classifiers based on the combined training set from two domains using multiple base kernels of different kernel types and parameters, which are fused with equal weights to obtain an average classifier. Finally, we propose a cross-domain learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), to learn an adapted classifier based on multiple base kernels and the prelearned average classifiers by minimizing both the structural risk functional and the mismatch between data distributions from two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data.
关 键 词: 网络视频; 视觉; 自适应多核学习
课程来源: 视频讲座网
最后编审: 2019-03-13:lxf
阅读次数: 68