0


注意(语言)差距-从Wikidata为文章占位符生成多语言维基百科摘要

Mind the (Language) Gap- Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders
课程网址: http://videolectures.net/eswc2018_kaffee_wikipedia_summaries/  
主讲教师: Lucie-Aimée Kaffee
开课单位: 南安普顿大学
开课时间: 2018-07-10
课程语种: 英语
中文简介:
虽然维基百科有287种语言,但其内容在这些语言之间的分布并不均匀。因此,把精力集中在那些只能接触到有限维基百科内容的语言上,具有极大的社会和文化重要性。在这项工作中,我们在给定结构化数据作为输入的情况下,通过用服务不足的语言为维基百科文章生成摘要来研究支持社区。我们将重点关注对此类摘要的重要支持:articleplaceholder,它是在服务不足的Wikipedia版本中动态生成的内容页面。它们使母语使用者能够访问Wikidata(一个结构化知识库)中的现有信息。为了扩展这些ArticlePlaceholder,我们提供了一个系统,它处理由ArticlePlaceholder提供的知识库三元组,并生成一个可理解的文本摘要。采用这种数据驱动的方法的目的是了解它如何匹配社区对Web上两种服务不足的语言的需求:阿拉伯语(阿拉伯语是一种拥有庞大社区的语言,对在线知识的访问不成比例)和世界语(世界语是一种容易熟悉的人工语言,其维基百科内容由一个小但专注的社区维护)。在阿拉伯语和世界语维基百科的帮助下,我们进行了一项研究,不仅评估生成文本的质量,还评估我们的终端系统对任何服务不足的维基百科版本的有用性。
课程简介: While Wikipedia exists in 287 languages, its content is unevenly distributed among them. It is therefore of utmost social and cultural importance to focus efforts on languages whose speakers only have access to limited Wikipedia content. In this work, we investigate supporting communities by generating summaries for Wikipedia articles in underserved languages, given structured data as an input. We focus on an important support for such summaries: ArticlePlaceholders, which are dynamically generated content pages in underserved Wikipedia versions. They enable native speakers to access existing information in Wikidata, a structured Knowledge Base (KB). To extend those ArticlePlaceholders, we provide a system, which processes the triples of the KB as they are provided by the ArticlePlaceholder, and generate a comprehensible textual summary. This data-driven approach is employed with the goal of understanding how well it matches the communities’ needs on two underserved languages on the Web: Arabic, a language with a big community with disproportionate access to knowledge online, and Esperanto, an easily-acquainted, artificial language whose Wikipedia content is maintained by a small but devoted community. With the help of the Arabic and Esperanto Wikipedians, we conduct a study which evaluates not only the quality of the generated text, but also the usefulness of our end-system to any underserved Wikipedia version.
关 键 词: 维基百科; 结构化数据; 终端系统
课程来源: 视频讲座网
数据采集: 2022-11-08:chenjy
最后编审: 2022-11-08:chenjy
阅读次数: 24