0


VoxEL:用于多语言实体链接的基准数据集

VoxEL: A Benchmark Dataset for Multilingual Entity Linking
课程网址: http://videolectures.net/iswc2018_rosales_mendez_voxel_benchmark/  
主讲教师: Henry Rosales-Méndez
开课单位: 智利大学
开课时间: 2018-11-22
课程语种: 英语
中文简介:
实体链接(EL)任务识别文本语料库中的实体提及,并将它们与给定知识库中的相应实体相关联。虽然传统的EL方法主要侧重于英语文本,但目前的趋势是语言不可知或其他多语言方法,可以在多种语言的文本上执行EL。正在进行的多语言EL研究的一个障碍是缺少不同语言中具有相同文本的注释数据集。因此,在这项工作中,我们提出了ds:一个手动注释的多语言EL黄金标准,其特点是用五种欧洲语言表达相同的文本。我们首先激励和描述ds数据集,使用它来比较五种不同语言的最先进EL(多语言)系统的行为,并将这些结果与使用机器翻译为英语所获得的结果进行对比。总的来说,我们的结果确定了五种最先进的多语言EL系统对各种语言的比较,以及不同语言的结果的比较,并进一步表明,将输入文本翻译成英语的机器翻译现在是专用多语言EL配置的竞争替代方案。
课程简介: The Entity Linking (EL) task identifies entity mentions in a text corpus and associates them with corresponding entities in a given knowledge base. While traditional EL approaches have largely focused on English texts, current trends are towards language-agnostic or otherwise multilingual approaches that can perform EL over texts in many languages. One of the obstacles to ongoing research on multilingual EL is a scarcity of annotated datasets with the same text in different languages. In this work we thus propose ds: a manually-annotated gold standard for multilingual EL featuring the same text expressed in five European languages. We first motivate and describe the ds dataset, using it to compare the behavior of state of the art EL (multilingual) systems for five different languages, contrasting these results with those obtained using machine translation to English. Overall, our results identify how five state-of-the-art multilingual EL systems compare for various languages, how the results of different languages compare, and further suggest that machine translation of input text to English is now a competitive alternative to dedicated multilingual EL configurations.
关 键 词: 任务识别文本语料库; 多语言EL研究; 描述ds数据集; 专用多语言EL配置
课程来源: 视频讲座网
数据采集: 2023-01-16:cyh
最后编审: 2023-01-16:cyh
阅读次数: 34