0


DANTE数据库:它是什么,它是如何创建的,以及它对未来词典和词典的贡献

The DANTE database: what it is, how it was created, and what it can contribute to the dictionaries and lexicons of the future
课程网址: http://videolectures.net/elex2011_rundell_dante/  
主讲教师: Michael Rundell
开课单位: 词典编纂大师班
开课时间: 2011-12-02
课程语种: 英语
中文简介:
Dante是一个词汇数据库,提供基于语料库的精细英语核心词汇描述。记录在数据库中的每个事实都来自当前英语的17亿字语料库的证据,并得到其明确支持。几乎所有这些事实都是机器可检索的。 Dante - 分析英语文本数据库 - 由Lexicography Master Class和18强技术词典编纂者为Foras na Gaeilge设计和创建,使用Sketch Engine([url][url][url].sketchengine.co.uk)进行语料库查询,以及入境大楼的IDM字典制作系统(DPS:[url][url][url].idm.fr)。由此产生的数据库记录了超过42,000个单词词汇和23,000个化合物和短语动词的语义,语法,组合和文本类型特征,并包括超过27,000个成语和短语,由语料库中的600,000多个句子示例支持。该项目开创了项目管理,软件定制,文本创建和质量控制的新方法。总的来说,这些举措使我们能够在词典编写过程中实现显着的自动化水平(从而节省成本),以及更高的系统性。大多数这些创新都是可转移的,因此我们对Dante项目的经验对整个词典方法论有影响。虽然但丁开始作为一个“英语框架”开始生活,其目的是开发一个新的英语爱尔兰词典\,但它被设计成超越这一主要功能的语言资源。它为发布者提供了用于开发或更新单语或双语词典的启动板,并为研究人员,软件开发人员和材料编写者提供了丰富的数据。在本次演讲中,我们将讨论该项目的方法创新,展示Dante的财富和数据范围,并反思这个独特数据库的长期潜力。
课程简介: Dante  is a lexical database which provides a fine-grained, corpus-based description of the core vocabulary of English. Every fact recorded in the database is derived from, and explicitly supported by, evidence from a 1.7 billion-word corpus of current English. Almost all of these facts are machine-retrievable. Dante – the Database of ANalysed Texts of English – was designed and created for Foras na Gaeilge by the Lexicography Master Class and an 18-strong team of skilled lexicographers, using the Sketch Engine ([url]) for corpus-querying, and IDM’s Dictionary Production System (DPS: [url]) for entry-building. The resulting database records the semantic, grammatical, combinatorial, and text-type characteristics of over 42,000 single-word lemmas and 23,000 compounds and phrasal verbs, and includes over 27,000 idioms and phrases, underpinned by over 600,000 sentence examples from the corpus. The project pioneered new approaches in project management, software customisation, text origination, and quality control. Collectively, these initiatives enabled us to achieve significant levels of automation (hence cost saving) in the lexicographic process, as well as greater systematicity. Most of these innovations are transferable, so our experience on the Dante project has implications for lexicographic methodology as a whole. Though Dante began life as an ‘English framework’ destined for the development of a new English-Irish dictionary  it was designed to be a linguistic resource beyond this primary function. It offers publishers a launchpad for the development or updating of monolingual or bilingual dictionaries, and provides rich data for researchers, software developers, and materials writers. In this talk we will discuss the project’s methodological innovations, demonstrate the wealth and range of data in Dante, and reflect on the long-term potential of this unique database.
关 键 词: 词汇数据库; 机器; 英语
课程来源: 视频讲座网
最后编审: 2021-12-22:liyy
阅读次数: 94