历史词典项目阅读程序的半自动化Semi-automating the Reading Programme for a Historical Dictionary Project |
|
课程网址: | http://videolectures.net/euralex2018_heid_historical/ |
主讲教师: | Ulrich Heid |
开课单位: | 希尔德斯海姆大学 |
开课时间: | 2018-07-27 |
课程语种: | 英语 |
中文简介: | 我们报告了在修订学术参考著作《南非英语历史原则词典》(DSAE,Silva等人。1996年),即半自动生成一个数字来源的词汇数据库,新的和更新的词典词条将以此为基础;以及同时在该项目中增加一个新的南非英语语料库(SAE)。利用在线数据源和广泛的已知SAE词形列表,我们开发了一个软件工具链,用于收集、编码、注释和整理文本源,产生:(i)31亿个南非英语词性注释语料库;(ii)一个约20个示例性引文的词汇数据库,000种已知的SAE单词形式,可在进入修订阶段进行选择;(iii)潜在变体和候选词列表。对于最近的电子来源,这些步骤取代了报价收集的机械方面,通常通过阅读程序手动进行,需要多年的团队合作才能获得足够的覆盖率(参见Hicks,2010)。 |
课程简介: | We report on a major enabling step towards the revision of the scholarly reference work A Dictionary of South African English on Historical Principles (DSAE, Silva et al. 1996), namely the semi-automatic generation of a digitally-sourced lexical database on which new and updated dictionary entries will be based; as well as the addition, in parallel, of a new corpus of South African English (SAE) to the project. Drawing on online data sources and an extensive list of known SAE word forms, we have developed a software toolchain to gather, encode, annotate and collate textual sources, producing: (i) a 3.1-billion part-of-speech-annotated corpus of South African English; (ii) a lexical database of illustrative quotations for about 20,000 known SAE word forms, available for selection at the entry-revision stage; and (iii) lists of potential variants and inclusion candidates. These steps replace, where recent electronic sources are concerned, the mechanical aspects of quotation gathering, normally undertaken manually through a reading programme requiring years of teamwork to acquire sufficient coverage (cf. Hicks, 2010). |
关 键 词: | 数据库; 阅读; 单词 |
课程来源: | 视频讲座网 |
数据采集: | 2020-11-09:yxd |
最后编审: | 2020-11-09:yxd |
阅读次数: | 69 |