
Highly Multilingual News Analysis Applications
课程网址: http://videolectures.net/ecmlpkdd09_steinberger_hmna/  
主讲教师: Ralf Steinberger
开课单位: 欧盟委员会
开课时间: 2009-10-20
课程语种: 英语
公共访问的欧洲媒体监控(EMM)应用程序系列每天收集和分析80000至100000篇在线新闻文章,使用多达43种语言。通过提取这些文章中的元信息,它们提供了新闻的聚合视图,它们允许监视趋势,并随着时间的推移甚至跨语言导航新闻。EMM NewsExplorer还从多语言新闻中收集有关人员和组织的历史信息,生成基于共同发生和报价的社交网络等。所有EMM应用程序都是在位于意大利伊斯普拉的欧洲委员会联合研究中心(JRC)开发的,并由其维护。应用程序结合使用各种文本分析工具,包括集群、多标签文档分类、命名实体识别、跨语言和书写系统的名称变量匹配、主题检测和跟踪、事件场景模板填充等。由于所涵盖的语言数量众多,因此在开发这些文本挖掘组件时,使用了语言学方面的糟糕方法。演讲者将概述各种应用程序,然后解释所选文本分析组件的工作原理。
课程简介: The publicly accessible Europe Media Monitor (EMM) family of applications  gather and analyse an average of 80,000 to 100,000 online news articles per day in up to 43 languages. Through the extraction of meta-information in these articles, they provide an aggregated view of the news, they allow to monitor trends and to navigate the news over time and even across languages. EMM-NewsExplorer additionally collects historical information about persons and organisations from the multilingual news, generates co-occurrence and quotation-based social networks, and more. All EMM applications were entirely developed at, and are being maintained by, the European Commission’s Joint Research Centre (JRC) in Ispra, Italy. The applications make combined use of a variety of text analysis tools, including clustering, multi-label document classification, named entity recognition, name variant matching across languages and writing systems, topic detection and tracking, event scenario template filling, and more. Due to the high number of languages covered, linguistics-poor methods were used for the development of these text mining components. The speaker will give an overview of the various applications and will then explain the workings of selected text analysis components.
关 键 词: 聚类; 多标签文档分类; 计算机科学
课程来源: 视频讲座网
最后编审: 2021-12-23:liyy
阅读次数: 40