
Polish Parliamentary Corpus
课程网址: http://videolectures.net/parlaCLARIN2018_ogrodniczuk_parliamentar...  
主讲教师: Maciej Ogrodniczuk
开课单位: 波兰科学院
开课时间: 2018-05-30
课程语种: 英语

本文介绍了波兰议会语料库(PPC),这是一种基于波兰下议院语料库的新资源,并随着当前的参议院议事程序和较旧的(1918-1990年)议会记录而扩展。语料库文本会自动使用波兰语的最新语言工具进行注释,从而形成多层隔离句和标记级别的细分,歧义的句法信息,句法词和组,命名实体和共指。语料库将不断使用当前会议的新数据进行更新。目前,PPC是世界上最大的议会语料库之一,大约有200个。 3亿个字。

课程简介: This paper presents the Polish Parliamentary Corpus (PPC) – a new resource built upon the Polish Sejm Corpus and extended with current Senate proceedings and older (1918–1990) parliamentary transcripts. Corpus texts are automatically annotated with state-of-the-art language tools for Polish, resulting in a multi-layered stand-off sentence- and token-level segmentation, disambiguated morphosyntactic information, syntactic words and groups, named entities and coreference. The corpus is being constantly updated with new data from the current sittings. Currently the PPC is among the largest parliamentary corpora worldwide, amounting to approx. 300M words.
关 键 词: 波兰议会语料库; 最新语言工具; 多层隔离句
课程来源: 视频讲座网
数据采集: 2021-02-13:cjy
最后编审: 2021-02-13:cjy
阅读次数: 69