危机数据的跨语言分类Cross-Lingual Classification of Crisis Data |
|
课程网址: | http://videolectures.net/iswc2018_khare_cross_lingua_classificati... |
主讲教师: | Prashant Khare |
开课单位: | 辛辛那提大学 |
开课时间: | 2018-11-22 |
课程语种: | 英语 |
中文简介: | 如今,许多公民在危机期间涌向社交媒体,分享或获取事件的最新信息。由于此类事件期间通常会传播大量数据,因此有必要能够有效过滤掉不相关的帖子,从而将注意力集中到与危机真正相关的帖子上。最近的研究尝试了各种统计和语义方法,以自动对给定危机或一组危机的相关和无关帖子进行分类。然而,目前尚不清楚当有关危机的帖子以不同语言生成时,这些方法的效果如何。典型的方法是为每种语言训练模型,但这代价高昂、耗时,对于快速演变的危机局势来说,这不是一个可行的选择。在本文中,我们测试了来自30个危机事件的跨语言数据集的统计和语义分类方法,其中包括主要用英语、西班牙语和意大利语撰写的帖子。我们用一种语言对模型进行训练,用另一种语言进行测试,并将数据转换成一种语言。我们表明,与统计模型相比,从外部知识库中提取的语义特征的添加显示了准确性的提高。 |
课程简介: | Many citizens nowadays flock to social media during crises to share or acquire the latest information about the event. Due to the sheer volume of data that is typically circulated during such events, it is necessary to have the ability to efficiently filter out irrelevant posts, and thus focus attention to the posts that are truly of relevance to the crisis. Recent research experimented with various statistical, and semantic, methods to automatically classify relevant and irrelevant posts to a given crisis or set of crises. However, it is unclear how such approaches perform when the posts about a crisis are generated in different languages. The typical approach is train the model for each language, but this is costly, time consuming, and not a viable option for rapidly evolving crisis situations. In this paper we test statistical and semantic classification approaches on cross-lingual datasets from 30 crisis events, consisting of posts written mainly in English, Spanish, and Italian. We experiment with scenarios where the model is trained on one language, and tested on another, and where the data is translated to a single language. We show that the addition of semantic features extracted from external knowledge bases show increases in accuracy over the statistical model. |
关 键 词: | 不同语言生成; 跨语言数据集的统计; 统计和语义分类方法; 提取的语义特征 |
课程来源: | 视频讲座网 |
数据采集: | 2023-01-07:cyh |
最后编审: | 2023-01-07:cyh |
阅读次数: | 37 |