0


麦克风阵列语音识别:驱动定位上的字错误率的影响

Microphone Array Driven Speech Recognition: Influence of Localization on the Word Error Rate
课程网址: http://videolectures.net/mlmi04uk_wolfel_ilwer/  
主讲教师: Matthias Wolfel
开课单位: 卡尔斯鲁厄大学
开课时间: 2007-02-25
课程语种: 英语
中文简介:
最近,自动语音识别研究界的兴趣集中在识别麦克风位于中场的语音,而不是安装在耳机上并位于扬声器嘴旁,以实现普适计算的长期目标。这是使用麦克风阵列的波束成形技术的自然应用。波束形成技术的最佳性能的关键因素是扬声器位置。因此,为了应用这种技术,需要源定位算法。在以前的工作中,我们提出使用扩展卡尔曼滤波器直接更新基于到达时间延迟的扬声器定位系统中的位置估计。我们还使用视频信息增强了我们的音频定位器。在这项工作中,我们研究扬声器位置对操作在波束形成器输出上的自动语音识别系统的字错误率的影响,并将该误差率与用近距离通话麦克风获得的误差率进行比较。此外,我们比较了不同定位算法的有效性。我们在由实际发言人举办的研讨会组成的数据集上测试了我们的算法。我们的实验表明,准确的说话人跟踪对于最小化远场语音识别系统的错误至关重要。
课程简介: Interest within the automatic speech recognition research community has recently focused on the recognition of speech where the microphone is located in the medium field, rather than being mounted on a headset and positioned next to the speakers mouth to realize the long-term goal of ubiquitous computing. This is a natural application for beamforming techniques using a microphone array. A crucial ingredient for optimal performance of beamforming techniques is the speaker location. Hence, to apply such techniques, a source localization algorithm is required. In prior work, we proposed using an extended Kalman filter to directly update position estimates in a speaker localization system based on time delays of arrival.We also have enhanced our audio localizer with video information. In this work, we investigate the influence of the speaker position on the word error rate of an automatic speech recognition system operating on the output of a beamformer, and compare this error rate with that obtained with a close talking microphone. Moreover, we compare the effectiveness of different localization algorithms. We tested our algorithm on a data set consisting of seminars held by actual speakers. Our experiments revealed that accurate speaker tracking is crucial for minimizing the errors of a farfield speech recognition system.
关 键 词: 自动语音识别研究界; 介质场; 阵列波束; 算法; 数据集
课程来源: 视频讲座网
最后编审: 2020-06-29:yumf
阅读次数: 111