0


在OWL建模语言语料库:powla/dl

POWLA: Modeling Linguistic Corpora in OWL/DL
课程网址: http://videolectures.net/eswc2012_chiarcos_powla/  
主讲教师: Christian Chiarcos
开课单位: 南加州大学
开课时间: 2012-07-04
课程语种: 英语
中文简介:
本文描述了POWLA,一种在OWL / DL中形式化语言语料库的形式。 POWLA基于NLP社区目前开发的数据模型,以克服语言注释的异质性(Ide和Pustejovsky 2010),特别是PAULA,一种由语言注释框架(LAF,Ide和早期草图)开发的XML对峙格式。 Romary 2004)目前在ISO TC37 / SC4中开发。这些数据模型被定义为有向无环(超)图的特化,并且声称每种语言注释都可以表示为有向(超)图(Bird and Liberman 2001)。因此,语言语料库可以在RDF中自然地线性化。与通过语义Web标准(例如,Cassidy 2010)为语言注释建模通用数据模型的早期方法不同,POWLA使用在OWL / DL本体中形式化的数据模型来增强语言数据的RDF线性化,该数据模型定义了主要数据的数据类型,注释和语言元数据,以及对语言语料库的一致性约束。与在OWL / DL中模拟语言语料库的其他方法不同(例如,Burchardt等人,2008),POWLA不是特定于特定类型的注释,而是实现通用数据模型。这里说明了这种通用性,用于将GrAF(语言注释格式的XML线性化,Ide和Suderman 2007)转换为POWLA。 POWLA保留了原始GrAF数据中传达的语言信息,通过POWLA数据上的SPARQL宏来模拟ANNIS QL,这是一种专门为异构和丰富注释的语言语料库设计的查询语言(Chiarcos等人,2008)。 。最后,本文确定了语言语料库(特别是POWLA)的通用数据模型的OWL / RDF线性化的优缺点,与传统的XML格式(Ide和Suderman 2007,Chiarcos等人,2008)相比较。
课程简介: This paper describes POWLA, a formalism to formalize linguistic corpora in OWL/DL. POWLA is based on data models currently developed by the NLP community to overcome the heterogeneity of linguistic annotation (Ide and Pustejovsky 2010), in particular, PAULA, an XML standoff format developed out of early sketches of the Linguistic Annotation Framework (LAF, Ide and Romary 2004) which is currently developed within ISO TC37/SC4. These data models are defined as specializations of directed acyclic (hyper)graphs, and it is claimed that every kind of linguistic annotation can be represented as a directed (hyper)graph (Bird and Liberman 2001). Linguistic corpora can thus be naturally linearized in RDF. Unlike earlier approaches to model generic data models for linguistic annotations by means of Semantic Web standards (e.g., Cassidy 2010), POWLA augments the RDF linearization of linguistic data with a data model formalized in an OWL/DL ontology that defines data types for primary data, annotations and linguistic metadata, as well as consistency constraints on linguistic corpora. Unlike other approaches to model linguistic corpora in OWL/DL (e.g., Burchardt et al. 2008), POWLA is not specific to a particular type of annotation, but it implements a generic data model. This genericity is illustrated here for the conversion of GrAF (the XML linearization of the Linguistic Annotation Format, Ide and Suderman 2007) to POWLA. That POWLA preserves the linguistic information conveyed in the original GrAF data as shown by an experient to emulate ANNIS-QL, a query language specifically designed for heterogeneous and richly annotated linguistic corpora (Chiarcos et al. 2008), by means of SPARQL macros on POWLA data. Finally, the paper identifies advantages and disadvantages of OWL/RDF linearizations of generic data models for linguistic corpora (and in particular, POWLA) as compared to traditional XML standoff formats (Ide and Suderman 2007, Chiarcos et al. 2008).
关 键 词: 语言语料库; 异质性; 语言注释框架
课程来源: 视频讲座网
最后编审: 2019-04-13:cwx
阅读次数: 84