0


从多语言网络新闻流中挖掘具有与时间相关的突发命名实体

Mining Named Entities with Temporally Correlated Bursts from Multilingual Web News Streams
课程网址: http://videolectures.net/wsdm2011_kotov_mne/  
主讲教师: Alexander Kotov
开课单位: 伊利诺伊大学
开课时间: 2011-08-09
课程语种: 英语
中文简介:
在这项工作中,我们研究了一个新的文本挖掘问题,发现命名实体与时间相关的突发提及计数在多个多语言网络新闻流。摘要在多语言文本流中挖掘具有短时相关提及计数的命名实体具有许多有趣而重要的应用,如识别潜在事件、吸引各国在线媒体的关注、以及有价值的音译语言知识等。矿业“丛发性”术语在单个文本流研究,检测与暂时的问题相关的爆发在多语言的Web流提出了两个新的挑战:(i)相关条款在多个流可能会爆发不同数量级的强度和(2)的相关条款可能相隔的时间差距。我们提出了一种分两阶段的方法,用于从多个数据流中挖掘具有时间相关性突发的项,这两种方法都解决了这两个问题。在该方法的第一阶段,采用马尔可夫调制泊松过程对不同实体的时间行为进行规范化建模。在第二阶段,使用动态规划算法发现不同项目的关联突发,这些突发事件可能被时间间隔分隔开。我们通过从多语言Web新闻流中发现命名实体的音译来评估我们的方法。实验结果表明,该方法不仅能够有效地发现多语言Web新闻流中具有关联突发的命名实体,而且在静态文本集合中无监督发现音译的两种最先进的基线方法上都有较好的表现。
课程简介: In this work, we study a new text mining problem of discovering named entities with temporally correlated bursts of mention counts in multiple multilingual Web news streams. Mining named entities with temporally correlated bursts of mention counts in multilingual text streams has many interesting and important applications, such as identification of the latent events, attracting the attention of on-line media in different countries, and valuable linguistic knowledge in the form of transliterations. While mining "bursty" terms in a single text stream has been studied before, the problem of detecting terms with temporally correlated bursts in multilingual Web streams raises two new challenges: (i) correlated terms in multiple streams may have bursts that are of different orders of magnitude in their intensity and (ii) bursts of correlated terms may be separated by time gaps. We propose a two-stage method for mining items with temporally correlated bursts from multiple data streams, which addresses both challenges. In the first stage of the method, the temporal behavior of different entities is normalized by modeling them with the Markov-Modulated Poisson Process. In the second stage, a dynamic programming algorithm is used to discover correlated bursts of different items, that can be potentially separated by time gaps. We evaluated our method with the task of discovering transliterations of named entities from multilingual Web news streams. Experimental results indicate that our method can not only effectively discover named entities with correlated bursts in multilingual Web news streams, but also outperforms two state-of-the-art baseline methods for unsupervised discovery of transliterations in static text collections.
关 键 词: 音译; 计算机科学; 媒体
课程来源: 视频讲座网
最后编审: 2020-07-13:yumf
阅读次数: 52