0


情境化卷积神经网络的人类解析

Human Parsing With Contextualized Convolutional Neural Network
课程网址: http://videolectures.net/iccv2015_liang_human_parsing/  
主讲教师: Xiaodan Liang
开课单位: 新加坡国立大学电气与计算机工程系
开课时间: 2016-02-10
课程语种: 英语
中文简介:
在这项工作中,我们使用一种新的上下文化卷积神经网络(联合有线电视新闻网)架构来解决人工解析任务,该架构很好地将跨层上下文、全局图像级上下文、超像素内上下文和跨超像素邻域上下文集成到一个统一的网络中。给定输入的人类图像,联合有线电视新闻网以端到端的方式产生逐像素分类。首先,我们的基本局部-全局-局部结构捕获了跨层上下文,该结构分层地结合了不同卷积层的全局语义信息和局部精细细节。其次,将全局图像级标签预测作为联合有线电视新闻网中间层的辅助目标,其输出进一步用于指导后续卷积层的特征学习,以利用全局图像级上下文。最后,为了进一步利用局部超像素上下文,将超像素内平滑和跨超像素邻域投票作为联合有线电视新闻网的自然子组件来实现训练和测试过程中的局部标签一致性。对两个公共数据集的综合评估很好地证明了我们的联合有线电视新闻网在人工解析方面比其他最先进的技术有显著的优势。其中,联合有线电视新闻网在大数据集[15]上的F-1得分达到76.95%,显著高于M-CNN[21]和ATR[15]两种最先进算法的62.81%和64.38%。
课程简介: n this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network. Given an input human image, Co-CNN produces the pixel-wise categorization in an end-to-end way. First, the cross-layer context is captured by our basic local-to-global-to-local structure, which hierarchically combines the global semantic information and the local fine details across different convolutional layers. Second, the global image-level label prediction is used as an auxiliary objective in the intermediate layer of the Co-CNN, and its outputs are further used for guiding the feature learning in subsequent convolutional layers to leverage the global image-level context. Finally, to further utilize the local super-pixel contexts, the within-super-pixel smoothing and cross-super-pixel neighbourhood voting are formulated as natural subcomponents of the Co-CNN to achieve the local label consistency in both training and testing process. Comprehensive evaluations on two public datasets well demonstrate the significant superiority of our Co-CNN over other state-of- the-arts for human parsing. In particular, the F-1 score on the large dataset [15] reaches 76.95% by Co-CNN, significantly higher than 62.81% and 64.38% by the state-of-the-art algorithms, M-CNN [21] and ATR [15], respectively.
关 键 词: 神经网络; 文化卷积; 人类解析; 全局图像
课程来源: 视频讲座网
数据采集: 2023-04-22:chenxin01
最后编审: 2023-05-18:chenxin01
阅读次数: 20