
Néoveille - An automatic System for Lexical Units Life-Cycle Tracking
课程网址: http://videolectures.net/euralex2018_cartier_neoveille/  
主讲教师: Emmanuel Cartier
开课单位: 巴黎大学诺德13分校
开课时间: 2018-07-27
课程语种: 英语
本文详细介绍了通过报纸监控语料库跟踪词汇单位生命周期的方法、法语实验和软件原型。Néoveille平台结合了检测和跟踪语言变化的最先进流程,以及一个供语言学家创建和管理语料库、接受或拒绝自动检测到的新词、从语言学角度描述经过验证的新词并在monitor语料库上跟踪其生命周期的网络平台(Cartier,2016)。在本演示中,我们将重点介绍生命周期跟踪系统的专用模块。这项任务很有挑战性,因为它并不意味着任何新词汇项的创建,而是已经存在的词汇项的新用法。我们建议通过四个主要参数来应对这种变化:•词汇单位随时间的相对频率变化:时间线序列分析在商业分析中有着悠久的传统,已经提出了数学模型来从频率数据中检测变化点和趋势;语料库语言学也提出了一些措施来解决频率数据的历时变化(Hilpert和Gries,2016);我们将在一份法国当代监测报语料库上展示几种测量和分析的结果词汇单位组合特征的变化:之前关于“单词草图”(Kilgariff,2004)或“行为特征”(Gries,2012)的方法为通过搭配和搭配研究词汇单位的语义特征铺平了道路。我们将这些方法推广到词汇、词汇-句法和句法层面,通过使用应用于语言模型的生产率度量来跟踪组合变化。我们还建议将这种方法建立在历时结构语法的理论基础上,实现所谓的结构变化和结构化(Traugott and Trousdale,2013)词汇单位分布模式的变化:分布语义方法(Pantel等人,2010;Baroni和Lenci,2010)能够通过上下文的相似性从语义上收集词汇单位。分布语义学方法能够通过从一个时期到另一个时期明确不同的相似词汇单位来检测语义变化(Hamilton,2016)。我们将针对法国和当前的局限性给出一些结果离散和全视变化:最后一个参数通过跟踪文本类型、域和附加到文档的地理元数据(词法单位出现在文档中),进而跟踪这些参数的变化,从而跟踪变化。对于上述参数,我们将在一个跨越30年的法国当代语料库上进行实验,表明每个参数都能够跟踪特定的变化,并且参数的组合能够更细粒度地描述词汇变化。自动检测(Automatic detection)为词典编纂者提供了一系列工具来跟踪词汇单元的生命周期,同时考虑了语言和社会语言参数。所有结果将在项目网站上公布。
课程简介: This paper details methods, experiments in French and a software prototype designed to track lexical units life-cycle through newspapers monitor corpora. The Néoveille platform combines state-of-theart processes to detect and track linguistic changes and a web platform for linguists to create and manage their corpora, accept or reject automatically detected neologisms, describe linguistically the validated neologisms and follow their lifecycle on monitor corpora (Cartier, 2016). In this presentation, we will focus on the module dedicated to the life-cycle-tracking system. This task is challenging as it does not imply any creation of a new lexical item, but a new usage of an already existing lexical item. We propose to tackle this kind of change through four main parameters : • the relative frequency change of the lexical units through time : timeline series analysis have a long tradition in Business Analytics and mathematical models have been proposed to detect change points and trends from frequency data; corpus linguistics have also proposed several measures to tackle diachronic change from frequency data (Hilpert and Gries, 2016); we will present the results on several measures and analysis on a French contemporary monitor newspaper corpora; • change in the combinatorial profile of lexical units: previous approaches on “word sketch” (Kilgariff, 2004) or “behavioral profile” (Gries, 2012) have paved the way to the study of the semantic signature of lexical units through collocations and collostructions. We generalize these approaches to track combinatorial change at the lexical, lexico-syntactic and syntactic levels through the use of productivity measures applied to language models. We also propose to theoretically ground this approach on diachronic construction grammars, operationalizing so-called constructional change and constructionalization (Traugott and Trousdale, 2013). • change in the distributional profile of lexical units: the distributional semantic approach (Pantel et al., 2010; Baroni and Lenci, 2010) enables to semantically gather lexical units through similarity of contexts. The distributional semantics approach enables to detect semantic change by expliciting, from one period to another, different similar lexical units (Hamilton, 2016). We will present some results for French and current limitations; • diastratic and diatopic change: the last parameter enables to track the changes by keeping track of textual genres, domains and geographical metadata attached to documents where occur the lexical units, and in turn the changes in these parameters. For the above parameters, we will present experiments on a French contemporary corpora spanning 30 years, showing that every parameter is able to track specific changes, and that a combination of parameters enables a more fine-grained caracterization of lexical change. Automatic detection is offering to lexicographers a bunch of tools to track lexical units life-cycles, taking into account linguistic and socio-linguistic parameters. All results will be available on the project website.
关 键 词: 监控语料库; 跟踪词汇单位生命周期; 生命周期跟踪系统
课程来源: 视频讲座网
数据采集: 2022-02-12:zkj
最后编审: 2022-02-12:zkj
阅读次数: 58