0


语音中的声学规律:从词到段

Finding Acoustic Regularities in Speech: From Words to Segments
课程网址: http://videolectures.net/clsp_glass_regularities/  
主讲教师: Jim Glass
开课单位: 麻省理工学院
开课时间: 信息不详。欢迎您在右侧留言补充。
课程语种: 英语
中文简介:
自动语音识别器的开发通常是一个高度监督的过程,涉及到语音目录、词汇、声学和语言模型的规范,以及带注释的训练语料库。虽然一些模型参数可以通过自适应来修改,但是语音识别器的总体结构此后仍然相对稳定。虽然这种方法在有足够的人类专业知识和标记语料库的情况下对问题有效,但它受到较少的监督或无监督场景的挑战。它也与人类对语言和语言的处理形成了鲜明的对比,在这种处理中,学习是一种内在的能力。从机器学习的角度来看,一个互补的选择是通过利用语音信号中重复声学模式的结构,以无监督的方式发现单元库存。在这项工作中,我们使用模式发现方法来自动获取词汇实体,以及直接从未翻译的音频流中获取说话人和主题分段。我们的无监督单词获取方法利用了广泛使用的动态编程技术的分段变体,这使我们能够找到口语之间匹配的声学模式。通过在音频流中聚合有关这些匹配模式的信息,我们演示了如何将相似的声学序列组合在一起,以形成与词汇实体(如单词和短多单词短语)对应的集群。在一个讲义材料的语料库中,我们证明了使用这种技术发现的集群具有很高的纯度,并且许多相应的词汇恒等式与底层音频流相关。我们将声学模式匹配和聚类方法应用于语音和语言处理中的几个重要问题。除了展示这种方法在不同语言中的应用,我们还演示了两种自动确定语音簇识别的方法。最后,我们还展示了如何使用它来提供无监督的演讲者和主题分割。与Alex Park、Igor Malioutov和Regina Barzilay合作。
课程简介: The development of an automatic speech recognizer is typically a highly supervised process involving the specification of phonetic inventories, lexicons, acoustic and language models, along with annotated training corpora. Although some model parameters may be modified via adaptation, the overall structure of the speech recognizer remains relatively static thereafter. While this approach has been effective for problems when there is adequate human expertise and labeled corpora, it is challenged by less-supervised or unsupervised scenarios. It also stands in stark contrast to human processing of speech and language where learning is an intrinsic capability. From a machine learning perspective, a complementary alternative is to discover unit inventories in an unsupervised manner by exploiting the structure of repeating acoustic patterns within the speech signal. In this work we use pattern discovery methods to automatically acquire lexical entities, as well as speaker and topic segmentations directly from an untranscribed audio stream. Our approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which allows us to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, we demonstrate how to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multi-word phrases. On a corpus of lecture material, we demonstrate that clusters found using this technique exhibit high purity and that many of the corresponding lexical identities are relevant to the underlying audio stream. We have applied the acoustic pattern matching and clustering methods to several important problems in speech and language processing. In addition to showing how this methodology applies across different languages, we demonstrate two methods to automatically determine the identify of speech clusters. Finally, we also show how it can be used to provide an unsupervised segmentation of speakers and topics. Joint work with Alex Park, Igor Malioutov, and Regina Barzilay.
关 键 词: 自动语音识别; 动态编程技术; 声学模式匹配; 聚类方法
课程来源: 视频讲座网
最后编审: 2019-11-17:cwx
阅读次数: 19