0


从议会辩论中保存元数据

Preserving Metadata from Parliamentary Debates
课程网址: http://videolectures.net/parlaCLARIN2018_karakanta_parliamentary_...  
主讲教师: Alina Karakanta,
开课单位: 萨尔兰大学
开课时间: 2018-05-30
课程语种: 英语
中文简介:

多语种议会已成为收集单语种和多语种语料库的有用资源。但是,通常情况下会缺少有关说话者或句子原始语言的额外文本信息,因此,这些资源无法在翻译研究中得到充分利用。在本文中,我们提出了一种处理和建立平行语料库的方法,该语料库由欧洲议会的议会辩论组成,英语为德语,英语为西班牙语。本文记录了创建如此宝贵的资源所需的所有必要步骤(前处理和后处理)。除了并行语料库,我们还使用相同的方法收集英语,德语和西班牙语的单语种可比语料库。

课程简介: Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, it is often the case that extra-textual information about speakers or the original language of the sentences is absent, and as a result, these resources cannot be fully used in translation studies. In this paper we present a method for processing and building a parallel corpus consisting of parliamentary debates of the European Parliament for English into German and English into Spanish. The paper documents all necessary (pre- and post-) processing steps for creating such a valuable resource. In addition to the parallel corpora, we collect monolingual comparable corpora for English, German and Spanish using the same method.
关 键 词: 语言; 语料库; 元数据
课程来源: 视频讲座网
数据采集: 2020-11-02:yxd
最后编审: 2020-11-03:zyk
阅读次数: 32