0


作为二元分类的言语隔离问题

Cocktail Party Problem as Binary Classification
课程网址: http://videolectures.net/mlss09us_wang_cppbc/  
主讲教师: DeLiang Wang
开课单位: 俄亥俄州立大学
开课时间: 2009-07-30
课程语种: 英语
中文简介:
言语隔离或鸡尾酒会问题已证明极具挑战性。部分挑战源于缺乏仔细分析的计算目标。虽然混合物中每个声源的分离被认为是黄金标准,但我认为这样的目标既不现实也不是人类听觉系统的作用。在听觉掩蔽现象的推动下,我们建议将理想时间频率(T F)二元掩模作为计算听觉场景分析的主要目标。理想的二进制掩蔽以T F单位保持混合能量,其中本地信噪比超过某个阈值,并以其他T F单位拒绝混合能量。最近的心理物理学证据表明,对于正常听力和听力受损的听众而言,理想的二进制掩蔽导致在嘈杂环境中大的语音清晰度改善。理想二元掩模的有效性意味着声音分离可以被表述为二元分类的情况,其将鸡尾酒会问题打开到各种模式分类和聚类方法。作为一个例子,我讨论了一种最近的系统,该系统通过监督声学特征的分类来分离清音语音。
课程简介: Speech segregation, or the cocktail party problem, has proven to be extremely challenging. Part of the challenge stems from the lack of a carefully analyzed computational goal. While the separation of every sound source in a mixture is considered the gold standard, I argue that such an objective is neither realistic nor what the human auditory system does. Motivated by the auditory masking phenomenon, we have suggested instead the ideal time-frequency (T-F) binary mask as a main goal for computational auditory scene analysis. Ideal binary masking retains the mixture energy in T-F units where the local signal-to-noise ratio exceeds a certain threshold, and rejects the mixture energy in other T-F units. Recent psychophysical evidence shows that ideal binary masking leads to large speech intelligibility improvements in noisy environments for both normal-hearing and hearing-impaired listeners. The effectiveness of the ideal binary mask implies that sound separation may be formulated as a case of binary classification, which opens the cocktail party problem to a variety of pattern classification and clustering methods. As an example, I discuss a recent system that segregates unvoiced speech by supervised classification of acoustic-phonetic features.
关 键 词: 言语隔离; 黄金标准; 理想时间
课程来源: 视频讲座网
最后编审: 2019-07-23:cwx
阅读次数: 55