
Employing The Complete Face in AVSR to Recover from Facial Occlusions
课程网址: http://videolectures.net/wapa2011_hall_occlusions/  
主讲教师: Ben Hall
开课单位: 伦敦大学学院
开课时间: 2011-11-11
课程语种: 英语
现有的视听语音识别(AVSR)系统在视觉上将焦点集中在面部的一个小区域上,该区域以即时的嘴巴区域为中心。由于现实世界中的各种原因,这是糟糕的设计,因为对这个小区域的任何遮挡都会使所有视觉优势无效。通过设计,这是很差的,因为众所周知,人类会使用完整的脸来朗读语音。我们演示了一种新颖的视觉算法多通道渐变模型的新应用,该算法从整个面部部署信息以执行AVSR。我们的MCGM模型在嘴唇周围有一小部分感兴趣区域的情况下,其性能接近离散余弦变换的性能,但是在面部被遮挡的情况下,我们可以获得的结果与DCT可以达到的性能的将近70%匹配。 DCT最好的情况,嘴唇中心点入路。
课程简介: Existing Audio-Visual Speech Recognition (AVSR) systems visually focus intensely on a small region of the face, centred on the immediate mouth area. This is poor design for a variety reasons in real world situations because any occlusion to this small area renders all visual advantage null and void. This is poorby design because it is well known that humans use the complete face to speechread. We demonstrate a new application of a novel visual algorithm, the Multi-Channel Gradient Model, the deploys information from the complete face to perform AVSR. Our MCGM model performs near to the performance of Discrete Cosine Transforms in the case where a small region of interest around the lips, but in the case of an occluded face we can achieve results that match nearly 70% of the performance that DCTs can achieve on the DCT best case, lips centeric approach.
关 键 词: 语音识别; 嘴巴区域; 渐变模型
课程来源: 视频讲座网
最后编审: 2019-10-11:cwx
阅读次数: 67