首页计算机应用
   首页数学
0


基于Dirichlet多项式回归的任意特征的主题模型

Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression
课程网址: http://videolectures.net/uai08_mccallum_tmcaf/  
主讲教师: Andrew McCallum
开课单位: 马萨诸塞大学
开课时间: 2008-07-30
课程语种: 英语
中文简介:
尽管已经建立了完全生成模型成功用于建模内容文本文件,它们通常很尴尬,适用于文本数据和文档元数据的组合。 在本文中,我们提出了一种Dirichlet多项式回归(DMR)主题模型,该模型包括对文档主题分布的对数线性先验,该对数线性先验是文档观察到的特征(例如作者,出版地,参考文献和日期)的函数。 我们表明,通过选择适当的功能,DMR主题模型可以达到或超过为特定数据设计的多个先前发布的主题模型的性能。
课程简介: Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data.
关 键 词: 建模内容文本文件; 文档元数据; 多项式回归主题模型
课程来源: 视频讲座网
最后编审: 2020-07-13:yumf
阅读次数: 236