
Semi-automating the Reading Programme for a Historical Dictionary Project
课程网址: http://videolectures.net/euralex2018_heid_historical/  
主讲教师: Ulrich Heid
开课单位: 希尔德斯海姆大学
开课时间: 2018-07-27
课程语种: 英语

我们报告了朝着修改学术参考著作《南非历史原则词典》(DSAE,席尔瓦等人,1996)迈出的重要一步,这是半自动生成数字来源词汇数据库的过程,新的和更新的词典条目将基于该条目;以及同时为该项目添加了新的南非英语语料库(SAE)。借助在线数据源和广泛的已知SAE字形列表,我们开发了一种软件工具链来收集,编码,注释和整理文本源,从而产生:(i)31亿个南非英语语音注释语料库; (ii)约有20,000种已知SAE单词形式的说明性引用语汇的词汇数据库,可供在入门修订阶段选择; (iii)潜在变体和列入候选清单。在涉及最新电子资源的情况下,这些步骤代替了报价收集的机械方面,通常是通过阅读程序手动进行的,阅读程序需要多年的团队合作才能获得足够的覆盖范围(参见Hicks,2010年)。

课程简介: We report on a major enabling step towards the revision of the scholarly reference work A Dictionary of South African English on Historical Principles (DSAE, Silva et al. 1996), namely the semi-automatic generation of a digitally-sourced lexical database on which new and updated dictionary entries will be based; as well as the addition, in parallel, of a new corpus of South African English (SAE) to the project. Drawing on online data sources and an extensive list of known SAE word forms, we have developed a software toolchain to gather, encode, annotate and collate textual sources, producing: (i) a 3.1-billion part-of-speech-annotated corpus of South African English; (ii) a lexical database of illustrative quotations for about 20,000 known SAE word forms, available for selection at the entry-revision stage; and (iii) lists of potential variants and inclusion candidates. These steps replace, where recent electronic sources are concerned, the mechanical aspects of quotation gathering, normally undertaken manually through a reading programme requiring years of teamwork to acquire sufficient coverage (cf. Hicks, 2010).
关 键 词: 半自动化历史词典; 南非英语语料库; 阅读程序
课程来源: 视频讲座网
数据采集: 2020-11-26:cjy
最后编审: 2020-11-26:cjy
阅读次数: 30