0


自然语言处理的无监督学习

Unsupervised Learning for Natural Language Processing
课程网址: http://videolectures.net/uai08_klein_ul/  
主讲教师: Dan Klein
开课单位: 加州大学
开课时间: 2008-07-30
课程语种: 英语
中文简介:
鉴于文本数据丰富,无监督方法对自然语言处理非常有吸引力。我们提出了三个潜在变量系统,这些系统在以前由完全监督系统主导的领域中实现了最先进的结果。对于句法分析,我们描述了一种语法归纳技术,该技术以粗略的句法结构开始,并以无人监督的方式迭代地对其进行细化。由此产生的从粗到细的语法允许有效的粗到细推理方案,并以各种语言产生最佳的解析结果。对于共同参考分辨率,我们描述了一种话语模型,其中使用分层Dirichlet过程在文档之间共享实体。在每个文档中,通过注意状态和照应约束的顺序模型将实体重复地呈现为提及的字符串。尽管完全没有监督,但这种方法与最好的监督方法相比具有竞争力。最后,对于机器翻译,我们提出了一个从非平行语料库中学习翻译词典的模型。单词类型之间的对齐由先前的过度匹配建模。给定任何固定对齐,单词向量上的联合密度来自概率规范相关分析。即使底层语料库和语言不同,这种方法也能够发现高精度的翻译。
课程简介: Given the abundance of text data, unsupervised approaches are very appealing for natural language processing. We present three latent variable systems which achieve state-of-the-art results in domains previously dominated by fully supervised systems. For syntactic parsing, we describe a grammar induction technique which begins with coarse syntactic structures and iteratively refines them in an unsupervised fashion. The resulting coarse-to-fine grammars admit efficient coarse-to-fine inference schemes and have produced the best parsing results in a variety of languages. For co reference resolution, we describe a discourse model in which entities are shared across documents using a hierarchical Dirichlet process. In each document, entities are repeatedly rendered into mention strings by a sequential model of attentional state and anaphoric constraint. Despite being fully unsupervised, this approach is competitive with the best supervised approaches. Finally, for machine translation, we present a model which learns translation lexicons from non-parallel corpora. Alignments between word types are modeled by a prior over matchings. Given any fixed alignment, a joint density over word vectors derives from probabilistic canonical correlation analysis. This approach is capable of discovering high-precision translations, even when the underlying corpora and languages are divergent.
关 键 词: 自然语言处理; 感应技术; 匹配模型
课程来源: 视频讲座网
最后编审: 2020-06-27:zyk
阅读次数: 61