0


多媒体中的自动字符注释

Automated Character Annotation in Multimedia
课程网址: http://videolectures.net/mcvc08_zisserman_acam/  
主讲教师: Andrew Zisserman
开课单位: 牛津大学
开课时间: 2008-02-14
课程语种: 英语
中文简介:
我们描述了使用检测到的面部以及字幕和成绩单形式的现成注释自动识别电影和电视剧中的人物角色的进展。我们描述了如何对齐字幕和副本以对镜头中出现的角色(以及动作,情感,位置等)进行弱监督。由于通信问题,监督很弱,而且可能看不到人物。面部识别的视觉问题具有挑战性,因为面部以各种尺寸和姿势出现在图像中,并且在表达方面也有很大差异。幸运的是,视频包含表格中每个人的多个面部示例,可以使用简单的视觉跟踪轻松自动关联。这些例子减少了识别的模糊性。我们表明,通过说话人检测可以加强文本监督。虽然标签仍然是不完整和嘈杂的,但是足以学习用于识别的视觉模型,并且实现成功的字符识别。这是与Mark Everingham和Josef Sivic的联合工作。
课程简介: We describe progress in automatically identifying characters in films and TV series using their detected faces together with readily available annotation in the form of subtitles and transcripts. We describe how the subtitles and transcript can be aligned to give weak supervision on the characters present in a shot (as well as on the actions, emotions, locations etc). The supervision is weak because of correspondence problems and the character may not be visible. The visual problem of face recognition is challenging because faces appear in images at various sizes and pose, and also vary considerably in expression. Fortunately, videos contain multiple face examples of each person in a form that can easily be associated automatically using straightforward visual tracking. These multiple examples reduce the ambiguity of recognition. We show that the text supervision can be strengthened by speaker detection. Although the labelling is still incomplete and noisy, it is then sufficient to learn visual models for recognition, and achieve successful character identification. This is joint work with Mark Everingham and Josef Sivic.
关 键 词: 视觉模型; 现成注释自动识别; 视觉跟踪; 自动关联
课程来源: 视频讲座网
最后编审: 2019-05-16:cjy
阅读次数: 50