0


词法数据表示的统一数据模型:以EKILEX为例

Unified Data Modelling for Presenting Lexical Data: the Case of EKILEX
课程网址: http://videolectures.net/euralex2018_tavast_modelling/  
主讲教师: Arvi Tavast
开课单位: 爱沙尼亚语学院
开课时间: 2018-07-27
课程语种: 英语
中文简介:

爱沙尼亚语言研究所正在开发EKILEX,这是一种针对词库词典和本体词库的新字典编写系统。从长远来看,要有一个单一的数据源来提供有关爱沙尼亚语的一致信息,但系统还需要处理众多现有数据集。在本文中,我们介绍了在数据建模和导入遗留字典的初始样本方面的工作。数据模型基于单词和含义之间的m:n关系,即使系统中仍然存在单独的词典,单词和含义在字典之间都是统一的。词典特定的只是单词和含义之间的映射。词典的导入已揭示出数据质量方面的各种问题:歧义,规格不足,不一致和冲突。如果要实现长期愿景,就必须解决这些问题。我们还概述了人类和机器可读发布,语料库连接和量化(频率,显着性措施等)的下一步。

课程简介: The Institute of the Estonian Language is developing EKILEX, a new dictionary writing system for both semasiological dictionaries and onomasiological termbases. While the long-term vision is to have a single data source that provides consistent information about Estonian, the system also needs to cope with the multitude of existing datasets. In this paper, we present work in progress on modelling the data and importing an initial sample of legacy dictionaries. The data model is based on an m:n relation between words and meanings, which are both unified across dictionaries, even while there still are separate dictionaries in the system. What is dictionary-specific is only the mapping between word and meaning. The importing of dictionaries has revealed various issues with data quality: ambiguities, underspecification, inconsistencies and conflicts. These need to be dealt with, if the long-term vision is to be achieved. We also outline the next steps of human- and machine-readable publishing, corpus connection and quantification (frequency, salience measures, etc.).
关 键 词: 词义; 语料库; 拟声学术语库
课程来源: 视频讲座网
数据采集: 2020-11-02:yxd
最后编审: 2020-11-03:zyk
阅读次数: 21