跨语言句法迁移研究进展Advances in Cross-Lingual Syntactic Transfer |
|
课程网址: | http://videolectures.net/nipsworkshops2012_mcdonald_syntactic_tra... |
主讲教师: | Ryan McDonald |
开课单位: | 谷歌公司 |
开课时间: | 2013-01-11 |
课程语种: | 英语 |
中文简介: | 使用一种语言的注释资源来学习另一种语言的模型的想法已经存在了至少十年。通常,这些模型依赖于对并行数据的访问。但是,近来的方法集中在“直接”跨语言转移,特别是非词法化转移。词法化的解析模型仅以跨语言(通常是诱导标签或集群)可用的输入属性为条件。由于这些属性是通用的,因此有可能直接使用受过英语训练的其他每种语言的解析器。这种简单的方法已经证明了自己出奇的效果,并且以明显的优势胜过了最佳的弱监督模型。但是,这些模型所基于的假设在以单语言监督方法的级别获得解析准确性方面远远不够薄弱。在本演讲中,我将重点介绍将多源直接传输中的选择性参数共享工作中的思想移植到高精度的潜在CRF解析模型中。然后,我将介绍新颖的半监督学习算法,该算法可对未标记目标语言数据上的这些模型进行语言化处理,以实现显着改进。最终模型使我们向为世界上所有语言构建健壮的语法解析器更近了一步。 p> |
课程简介: | The idea to use annotated resources from one language to learn models for another has been around for at least a decade. Typically these models have relied on access to parallel data. However, recent approaches have focused on "direct" cross-lingual transfer, and in particular, delexicalized transfer. Delexicalized parsing models are conditioned only on properties of the input that are available across languages, typically induced tags or clusters. Since these properties are universally available, it is possible to directly use a parser trained on English for every other language. This simple method has shown itself to be surprisingly effective and outperforms the best weakly-supervised models by a significant margin. However, the assumptions underlying these models are far to weak to obtain parsing accuracies at the level of monolingual supervised methods. In this talk I will focus on porting ideas from work on selective parameter sharing in multi-source direct transfer to highly accurate latent CRF parsing models. I will then present novel semi-supervised learning algorithms that relexicalize these models on unlabeled target language data to give significant improvements. The final model brings us one step closer to building robust syntactic parsers for all the world's languages. |
关 键 词: | 语言数据; 语法解析; CRF解析模型; 弱监督模型; 语言解析器 |
课程来源: | 视频讲座网 |
数据采集: | 2021-05-26:zyk |
最后编审: | 2021-05-26:zyk |
阅读次数: | 90 |