0


NLP格式转换(NIF)

NLP Interchange Format (NIF)
课程网址: http://videolectures.net/w3cworkshop2011_hellmann_nif/  
主讲教师: Sebastian Hellmann
开课单位: 莱比锡大学
开课时间: 2011-12-12
课程语种: 英语
中文简介:
NIF是一种基于RDF/OWL的格式,允许以灵活、轻量级的方式组合和链接多个NLP工具。NIF的核心由一个词汇表组成,它可以将字符串表示为RDF资源。一个特殊的URI设计用于精确定位文档某个部分的注释。然后可以使用这些URI将任意注释附加到相应的字符序列。基于这些URI,可以在不同的NLP工具之间交换注释。尽管NLP工具在英语的所有语言级别上都有大量的可用性,但对于语言较少的语言来说,这通常不是这样。因此,特别需要创建一种允许NLP工具集成和互操作性的格式。关于多语言,我们想到了两个用例:1。一个已经存在的使用英语NLP工具的英语软件系统需要移植到另一种语言。另一种语言的NLP工具与系统不兼容,因为没有公共接口(例如:带关键字提取的CMS)。2。不同类型文档中的段落可以在RDF中用多语言翻译进行注释,这些翻译在文档的生命周期中可能保持稳定。特别是,引入的URI配方(上下文哈希)具有优越的属性,可以与其他URI命名方法进行比较。
课程简介: NIF is an RDF/OWL-based format that allows to combine and chain several NLP tools in a flexible, light-weight way. The core of NIF consists of a vocabulary, which can represent Strings as RDF resources. A special URI design is used to pinpoint annotations to a part of a document. These URIs can then be used to attach arbitrary annotations to the respective character sequence. Based on these URIs, annotations can be interchanged between different NLP tools. Although NLP Tools are abundantly available on all linguistic levels for the English language, this is often not the case for languages with fewer speakers. Thus, it becomes especially necessary to create a format that allows the integration and interoperability of NLP tools. With respect to multilinguality, two use cases come to mind: 1. an already existing English software system, that uses an English NLP tool needs to be ported to another language. The NLP tool for the other language is not compatible to the system, because there is no common interface (Example: A CMS with keyword extraction). 2. Paragraphs in different kinds of documents can be annotated in RDF with multilingual translations that can potentially remain stable over the life-time of a document. Especially, the introduced URI recipe (Context-Hash) possesses advantageous properties, which withstand comparison to other URI naming approaches.
关 键 词: NIF; 格式转换; 英语软件
课程来源: 视频讲座网
最后编审: 2021-12-23:liyy
阅读次数: 87