视听教学的语音分析与识别Audio-Visual Speech Analysis & Recognition |
|
课程网址: | http://videolectures.net/mcvc08_katsamanis_avsa/ |
主讲教师: | Nassos Katsamanis |
开课单位: | 雅典国家技术大学 |
开课时间: | 2008-02-14 |
课程语种: | 英语 |
中文简介: | 人类的语言产生和感知机制本质上是双峰的。所谓的mc-gurk效应为这种语音的视听特性提供了有趣的证据。为了正确地解释互补视觉方面,我们提出了一个统一的框架来分析语音,并在视听语音反演和识别等应用中展示我们的相关发现。通过主动外观建模分析说话人的面部,将提取的视觉特征与同时提取的声学特征相结合,以恢复说话人舌尖的运动等潜在的发音特征,或识别记录的发音,如说出的数字序列。还考虑了音频流和视频流之间可能的异步。对于识别的情况,我们还利用相应前端给出的特征不确定性,实现自适应融合。实验结果发表在qsmt、mocha和cuave视听数据库中。 |
课程简介: | Human speech production and perception mechanisms are essentially bimodal. Interesting evidence for this audiovisual nature of speech is provided by the so-called Mc Gurk effect. To properly account for the complementary visual aspect we propose a unified framework to analyse speech and present our related findings in applications such as audiovisual speech inversion and recognition. Speaker's face is analysed by means of Active Appearance Modelling and the extracted visual features are integrated with simultaneously extracted acoustic features to recover the underlying articulator properties, e.g., the movement of the speaker's tongue tip, or recognize the recorded utterance, e.g. the sequence of the numbers uttered. Possible asynchrony between the audio and visual stream is also taken into account. For the case of recognition we also exploit feature uncertainty as given by the corresponding front-ends, to achieve adaptive fusion. Experimental results are presented in QSMT, MOCHA and CUAVE audiovisual databases. |
关 键 词: | 感知机制; 视听语音识别; 声学特征; 自适应融合 |
课程来源: | 视频讲座网 |
最后编审: | 2020-04-01:chenxin |
阅读次数: | 147 |