0


语音识别的图形化模型:发音和音频视觉模型

Graphical Models for Speech Recognition: Articulatory and Audio-Visual Models
课程网址: http://videolectures.net/mlss09us_livescu_gmsraavm/  
主讲教师: Karen Livescu
开课单位: 芝加哥丰田技术学院
开课时间: 2009-07-30
课程语种: 英语
中文简介:
自20世纪80年代以来,自动语音识别的主要方法是使用隐马尔可夫模型(HMM),其中每个状态对应于相邻音素的上下文中的音素或音素的一部分。尽管语音信号粗略近似,并且仍有很大的改进余地,但HMM已被证明难以击败。在过去几年中,人们越来越关注用于语音识别的更复杂的图形模型,涉及多个状态流。我将描述两种这样的方法,一种建模发音变化作为“草率”的结果。发音变量的行为(嘴唇,舌头等的状态)和另一个模拟视听语音识别中的音频和视觉状态(即通过“唇读”增强的识别)。
课程简介: Since the 1980s, the main approach to automatic speech recognition has been using hidden Markov models (HMMs), in which each state corresponds to a phoneme or part of a phoneme in the context of the neighboring phonemes. Despite their crude approximation of the speech signal, and the large margin for improvement still remaining, HMMs have proven difficult to beat. In the last few years, there has been increasing interest in more complex graphical models for speech recognition, involving multiple streams of states. I will describe two such approaches, one modeling pronunciation variation as the result of the "sloppy" behavior of articulatory variables (the states of the lips, tongue, etc.) and the other modeling the audio and visual states in audio-visual speech recognition (i.e. recognition enhanced by "lipreading").
关 键 词: 语言分析; 隐马尔可夫模型; 自动语音识
课程来源: 视频讲座网
最后编审: 2020-06-29:zyk
阅读次数: 66