0


多语种背景下的概率主题建模:方法与应用概述

Probabilistic Topic Modeling in Multilingual Settings: A Short Overview of Its Methodology and Applications
课程网址: http://videolectures.net/nipsworkshops2012_vulic_topic_modeling/  
主讲教师: Ivan Vulić
开课单位: 鲁汶大学
开课时间: 2013-01-11
课程语种: 英语
中文简介:

概率主题模型是无监督的生成模型,可将文档内容建模为两步生成过程,即,文档被视为潜在主题的混合,而主题是词汇单词上的概率分布。最近,已经投入了大量的研究工作来将概率主题建模概念从单语言环境转换为多语言环境。已设计出新颖的主题模型,以处理平行且可比的文本。我们定义了多语言概率主题建模的概念,并简要概述了当前的研究和方法。作为一个有代表性的例子,我们彻底描述了一种在附录中根据可比数据训练的称为双语LDA(BiLDA)的多语言概率主题模型。在本文中,我们提供了跨语言应用程序的简短概述,到目前为止,我们已经在该模型中使用了该模型。

课程简介: Probabilistic topic models are unsupervised generative models that model document content as a two-step generation process, i.e., documents are observed as mixtures of latent topics, while topics are probability distributions over vocabulary words. Recently, a significant research effort has been invested into transferring the probabilistic topic modeling concept from monolingual to multilingual settings. Novel topic models have been designed to work with parallel and comparable texts. We define the concept of multilingual probabilistic topic modeling and present a short high-level overview of the current research and methodology. As a representative example, we thoroughly describe a multilingual probabilistic topic model called bilingual LDA (BiLDA) trained on comparable data in the appendix. In the paper we provide a short overview of cross-lingual applications for which we utilized the model in our research so far.
关 键 词: 概率主题; 多语言
课程来源: 视频讲座网
数据采集: 2020-11-30:zyk
最后编审: 2020-11-30:zyk
阅读次数: 53