0


在文本的语义

Getting at the Semantics of Texts
课程网址: http://videolectures.net/estc08_uszkoreit_gst/  
主讲教师: Hans Uszkoreit
开课单位: 德国人工智能研究中心
开课时间: 2008-11-24
课程语种: 英语
中文简介:
随着语义技术不断发展和成熟,人们越来越关注在所谓的非结构化数据中编码的巨大知识财富。实际上,网络(和书籍)中的大部分人类知识都体现在文本中。即使是最乐观的语义表征标准支持者也不希望通过智力劳动来重写或广泛补充语义元数据。另一方面,有一种称为计算语言学的科学和技术学科,几十年来一直关注人类语言的自动分析。该领域的最初目标之一是通过将文本翻译成机器可用于推理的知识表示语言来自动理解文本。然而,通过对这项任务的复杂性的清醒经验,大多数应用计算语言学家转向更容易的挑战。现在有各种各样的人类语言技术,其中许多已经启用了新类型的产品。这些应用包括文本分类,电子邮件响应系统,文本到语音软件,语法检查和统计机器翻译。然而,在本演示中,将通过示例解释和说明两种语言技术的最新技术和最近成就。其中之一是从大量文本中自动提取语义关系,或者更准确地说是关系实例。这种关系实例可以是事件,对象的属性或对产品的看法。利用我们自己研究的结果,我将展示机器学习技术如何与现有的高级语言分析方法相结合,以改进这种分析,超越单独使用这些方法中的任何一种方法所能达到的最佳结果。我还将展示如何利用语义域模型来改善关系提取的性能。第二个研究领域是人类语言的深层句法和语义分析。虽然大多数计算语言学家已经偏离了这一基本挑战,转而采用较低的悬挂果实,但仍有少数团体继续寻求文本理解。由于问题的严重性以及开发不仅仅是语言的技术的愿望,他们中的一些人在国际合作中合作。我将简要介绍这一领域的两个最大的国际合作,DELPH-IN计划致力于HPSG的深层语言处理和PARGRAM计划,旨在实现LFG的同一目标。 HPSG和LFG是上世纪七八十年代发展起来的两种先进的语言描述模型。 PARGRAM计划的结果由PARC领导,并且是最近被微软收购的搜索技术公司Powerset的核心资产之一。 DELPH-IN计划的结果被收集在一个开源的研究资源库中。我将解释这两个联盟和相关研究活动最近取得的进展的重要性。在演讲结束时,我将论证将机器学习方法与关系提取与深层语言处理研究的进步相结合,将为通过语义技术开发大量非结构化文本数据开辟道路。
课程简介: As semantic technologies keep evolving and maturing, there is growing concern about the gigantic wealth of knowledge encoded in so-called unstructured data. Actually the bulk of human knowledge on the web (and in books) is represented in texts. Not even the most optimistic proponents of semantic representation standards expect that this information will be rewritten or extensively complemented by semantic meta-data through intellectual labour. On the other hand, there is a discipline of science and technology called computational linguistics that has been concerned for several decades with the automatic analysis of human language. One of the original goals of this field was the automatic understanding of texts by translating them into a knowledge representation language that machines could use for reasoning. However, through sobering experience of the complexity of this task most applied computational linguists turned to easier challenges. There is now a wide variety of human language technologies, many of which have enabled new types of products. Among these applications are text classification, email response systems, text-to-speech software, grammar checking and statistical machine translation. In this presentation, however, the state of the art and recent achievements in two strands of language technology will be explained and illustrated by examples. One of them is the automatic extraction of semantic relations, or more precisely of relation instances, from large volumes of texts. Such relation instances could be events, properties of objects, or opinions on products. Using results from our own research, I will demonstrate how machine learning techniques were combined with existing advanced language analysis methods for improving such an analysis beyond the best results achievable by either one of these approaches alone. I will also show how the semantic domain models can be utilized for improving the performance of the relation extraction. The second strand of research to be presented is the deep syntactic and semantic analysis of human language. While most computational linguists had turned away from this fundamental challenge in favour of lower hanging fruit, a few groups continued the quest for text understanding. Because of the size of the problem and the desire to develop techniques that would work for more than language, several of them teamed up in international collaborations. I will briefly describe the two largest international collaborations in this area, the DELPH-IN initiative dedicated to deep language processing with HPSG and the PARGRAM initiative pursuing the same goal by LFG. HPSG and LFG are two advanced formal models of linguistic description developed in the seventies and eighties of last century. The results of the PARGRAM initiative were lead by PARC and are among the central assets of the search technology company Powerset which was recently acquired by Microsoft. The results of the DELPH-IN initiative are collected in growing a open-source repository of research resources. I will explain the significance of recent advances by these two consortia and related research activities. In the conclusion of the talk I will argue that a combination of the machine-learning approach to relation extraction with the advances of the deep linguistic processing research will open the way to an exploitation of large volumes of unstructured textual data by semantic technologies.
关 键 词: 非结构化数据编码; 语义关系的自动提取; 语义域模型; 深层语言处理研究进展
课程来源: 视频讲座网
最后编审: 2020-06-20:zyk
阅读次数: 52