
Language Technologies
课程网址: http://videolectures.net/sssw05_maynard_lt/  
主讲教师: Diana Maynard
开课单位: 谢菲尔德大学
开课时间: 2007-02-25
课程语种: 英语
本教程介绍了在语义Web和Web Services中使用语言技术的方法。它包括有关语义Web的HLT和文本挖掘,各种形式的信息提取,本体填充和语义元数据创建以及评估的部分。本教程首先介绍了人类语言技术,同时介绍了它的背景和发展,然后将其置于文本挖掘和其他涉及从大量非结构化文本中发现知识的任务中,这对于开发语言是必不可少的。语义网。第二部分涉及信息提取,这是文本挖掘的主要组成部分。信息提取涉及从非结构化数据中提取事实和结构化信息。我们将这与信息检索(后者​​涉及从大型文本集合中提取文档)和数据挖掘(将结构化数据中的模式泄露)进行对比。我们介绍了GATE,语言工程的体系结构以及其用于信息提取的资源,然后扩展了传统信息提取的概念,将重点放在基于语义Web的技术上,例如本体填充和语义元数据创建,这两者都涉及信息的使用。基于本体的提取。我们来看一些当前最先进的语义注释系统,例如KIM,Magpie,MnM和OntoMat。在第三部分中,我们讨论了这种技术的评估方法,基于这样的思想,即由于存在分层(本体)信息而不是扁平结构,因此传统方法在应用于语义Web技术时不够用。我们还将简要介绍注释系统的可用性问题。最后,本教程将演示两个用于语义Web的HLT示例。首先,我们介绍RichNews,其目的是使新闻节目的注释自动化,对成绩单中的新闻广播进行分段,描述和分类。其次,我们介绍在SEKT上下文中进行的基于本体的混合主动信息提取工作。
课程简介: This tutorial covers the use of Human Language Technologies for the Semantic Web and Web Services. It includes sections on HLT and Text Mining for the Semantic Web, various forms of Information Extraction, Ontology Population and Semantic Metadata Creation, and Evaluation. The tutorial begins with an introduction to Human Language Technology, looking at both its background and development, and then situating it within the context of text mining and other tasks involving knowledge discovery from large collections of unstructured text, which are necessary for the development of the semantic web. The second section concerns information extraction, a major component of text mining. Information extraction involves extracting facts and structured information from unstructured data. We contrast this with Information retrieval, which concerns extracting documents from large text collections, and with data mining, which concerns discoveing patterns in structured data. We introduce GATE, and architecture for language engineering, and its resources for information extraction, and then expand the idea of traditional information extraction to focus on semantic web-enabled technology such as ontology population and semantic metadata creation, both of which involve the use of information extraction based on ontologies. We look at some current state-of-the-art semantic annotation systems such as KIM, Magpie, MnM and OntoMat. In the third section, we discuss evaluation methods for such technology, based on the idea that traditional methods are insufficient when applied to semantic web technology, due to the presence of hierarchical (ontological) information rather than flat structures. We also take a brief look at usability issues of annotation systems. Finally, the tutorial gives demonstrations of two examples of HLT in use for the semantic web. First we present RichNews, which aims to automate the annotation of news programs, segmenting, describing and classifying news broadcasts from transcripts. Second, we present work on ontology-based and mixed initiative information extraction carried out in the context of SEKT.
关 键 词: 语义Web; Web Services; 语言技术; 文本挖掘; 信息提取
课程来源: 视频讲座网
最后编审: 2020-01-13:chenxin
阅读次数: 90