首页   → 语言学
首页 → 人文社会科学
  
    
 首页 → 人文社会科学
| 从议会辩论中保存元数据Preserving Metadata from Parliamentary Debates | |
| 课程网址: | http://videolectures.net/parlaCLARIN2018_karakanta_parliamentary_... | 
| 主讲教师: | Alina Karakanta, | 
| 开课单位: | 萨尔兰大学 | 
| 开课时间: | 2018-05-30 | 
| 课程语种: | 英语 | 
| 中文简介: | 多语种议会已成为收集单语种和多语种语料库的有用资源。但是,通常情况下会缺少有关说话者或句子原始语言的额外文本信息,因此,这些资源无法在翻译研究中得到充分利用。在本文中,我们提出了一种处理和建立平行语料库的方法,该语料库由欧洲议会的议会辩论组成,英语为德语,英语为西班牙语。本文记录了创建如此宝贵的资源所需的所有必要步骤(前处理和后处理)。除了并行语料库,我们还使用相同的方法收集英语,德语和西班牙语的单语种可比语料库。 p> | 
| 课程简介: | Multilingual parliaments have been a useful source for monolingual and multilingual corpus collection. However, it is often the case that extra-textual information about speakers or the original language of the sentences is absent, and as a result, these resources cannot be fully used in translation studies. In this paper we present a method for processing and building a parallel corpus consisting of parliamentary debates of the European Parliament for English into German and English into Spanish. The paper documents all necessary (pre- and post-) processing steps for creating such a valuable resource. In addition to the parallel corpora, we collect monolingual comparable corpora for English, German and Spanish using the same method. | 
| 关 键 词: | 语言; 语料库; 元数据 | 
| 课程来源: | 视频讲座网 | 
| 数据采集: | 2020-11-02:yxd | 
| 最后编审: | 2020-11-03:zyk | 
| 阅读次数: | 89 | 
 图 书 馆
图 书 馆