从议会辩论中保留元数据Preserving Metadata from Parliamentary Debates |
|
课程网址: | http://videolectures.net/parlaCLARIN2018_karakanta_parliamentary_... |
主讲教师: | Alina Karakanta |
开课单位: | 萨尔大学 |
开课时间: | 2018-05-30 |
课程语种: | 英语 |
中文简介: | 多语种议会已成为收集单语种和多语种语料库的有用资源。但是,通常情况下会缺少有关说话者或句子原始语言的额外文本信息,因此,这些资源无法在翻译研究中得到充分利用。在本文中,我们提出了一种处理和建立平行语料库的方法,该语料库由欧洲议会的议会辩论组成,英语为德语,英语为西班牙语。本文记录了创建如此宝贵的资源所需的所有必要步骤(前处理和后处理)。除了并行语料库,我们还使用相同的方法收集英语,德语和西班牙语的单语种可比语料库。 p> |
课程简介: | Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, it is often the case that extra-textual information about speakers or the original language of the sentences is absent, and as a result, these resources cannot be fully used in translation studies. In this paper we present a method for processing and building a parallel corpus consisting of parliamentary debates of the European Parliament for English into German and English into Spanish. The paper documents all necessary (pre- and post-) processing steps for creating such a valuable resource. In addition to the parallel corpora, we collect monolingual comparable corpora for English, German and Spanish using the same method. |
关 键 词: | 多语种议会; 多语种语料库; 单语种可比语料库 |
课程来源: | 视频讲座网 |
数据采集: | 2020-11-26:cjy |
最后编审: | 2020-11-26:cjy |
阅读次数: | 42 |