
Connectionist Temporal Classification for End-to-End Speech Recognition
课程网址: http://videolectures.net/interACT2016_metze_temporal_classificati...  
主讲教师: Florian Metze
开课单位: 卡内基梅隆大学
开课时间: 2016-07-31
课程语种: 英语
课程简介: The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. In this talk, I will present an approach that drastically simplifies building acoustic models for the existing weighted finite state transducer (WFST) based decoding approach, and lends itself to end-to-end speech recognition, allowing optimization for arbitrary criteria. Acoustic modeling now involves learning a single recurrent neural network (RNN), which predicts context-independent targets (e.g., syllables, phonemes or characters). The connectionist temporal classification (CTC) objective function marginalizes over all possible alignments between speech frames and label sequences, removing the need for a separate alignment of the training data. We present a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables the efficient incorporation of lexicons and language models into CTC decoding. Experiments show that this approach achieves state-of-the-art word error rates, while drastically reducing complexity and speeding up decoding when compared to standard hybrid DNN systems.
关 键 词: 自动语音识别; 端到端语音识别; 声学建模
课程来源: 视频讲座网
数据采集: 2021-11-26:zkj
最后编审: 2021-11-26:zkj
阅读次数: 100