早期语言引导Early language bootstrapping |
|
课程网址: | http://videolectures.net/mlss2011_dupoux_bootstrapping/ |
主讲教师: | Emmanuel Dupoux |
开课单位: | 法国高等师范学院 |
开课时间: | 2011-10-12 |
课程语种: | 英语 |
中文简介: | 尽管任务非常复杂,但人类婴儿可以自然而轻松地学习其环境中使用的语言。在过去的30年里,婴儿出生后头两年语言成就的实证调查取得了巨大进展。在他们短暂的生命中,婴儿以基本无人监督的方式学习他们母语的语音,音韵,词汇和句法组织的基本构建块(参见Jusczyk,1987)。然而,对这种收购负责机制知之甚少。婴儿是否依赖一般统计推断原则?他们是否依赖专门用于语言的算法?在这里,我将概述语言习得的早期阶段,并重点关注目前正在进行建模方法的一个领域,使用信号处理和自动语音识别工具:无语的语音类别获取。众所周知,在生命的第一年,在他们能够说话之前,婴儿构建了他们母语的音素的详细表示,并且失去了区分非本地音素对比的能力(Werker&Tees,1984)。可以看出,到目前为止提出的唯一机制,即无监督统计聚类(Maye,Werker和Gerken,2002),并没有集中在音素库存上,而是集中在语境上的异音单元或子单元上(Varadarajan) ,2008)。将提出一种信息理论算法:它基于三种信息来源将异音变体组合在一起:其背景的统计分布,分组的语音合理性以及词汇最小对的存在(Peperkamp等,2006; Martin)等人,提交)。结果表明,三种信息来源中的每一种都可以在不预先设定其他信息的情况下获得。然后在几个自然语音语料库上测试该算法。更一般的建议是早期语言报道不依赖于语言必然具体的学习原则。然而,语言可能是独一无二的,就是这些原则以特定的方式结合起来的方式,在仅仅几个月无语言暴露于语音信号之后优化语言类别的出现。 |
课程简介: | Human infants learn spontaneously and effortlessly the language(s) spoken in their environments, despite the extraordinary complexity of the task. In the past 30 years, tremendous progress has been made regarding the empirical investigation of the linguistic achievements of infants during their first two years of life. In that short period of their life, infants learn in an essentially unsupervised fashion the basic building blocks of the phonetics, phonology, lexical and syntactic organization of their native language (see Jusczyk, 1987). Yet, little is known about the mechanisms responsible for such acquisitions. Do infants rely on general statistical inference principles? Do they rely on specialized algorithms devoted to language? Here, I will present an overview of the early phases of language acquisition and focus on one area where a modeling approach is currently being conducted, using tools of signal processing and automatic speech recognition: the unsupervized acquisition of phonetic categories. It is known that during the first year of life, before they are able to talk, infants construct a detailed representation of the phonemes of their native language and loose the ability to distinguish nonnative phonemic contrasts (Werker & Tees, 1984). It will be shown that the only mechanism that has been proposed so far, that is, unsupervised statistical clustering (Maye, Werker and Gerken, 2002), does not converge on the inventory of phonemes, but rather on contextual allophonic units or subunits (Varadarajan, 2008). An information-theoretic algorithm wil be presented: it groups together allophonic variants based on three sources of information: the statistical distribution of their contexts, the phonetic plausibility of the grouping, and the existence of lexical minimal pairs (Peperkamp et al., 2006; Martin et al, submitted). It is shown that each of the three sources of information can be acquired without presupposing the others. This algorithm is then tested on several natural speech corpora. The more general proposal is that early language bootrapping does not rely on learning principles necessarily specific to language. What is presumably unique to language though, is the way in which these principles are combined in a particular ways to optimize the emergence of linguistic categories after only a few months of unsupervized exposure to speech signals. |
关 键 词: | 母语; 建模方法; 识别工具 |
课程来源: | 视频讲座网 |
最后编审: | 2020-01-13:chenxin |
阅读次数: | 81 |