0


潜在的超文本主题模型

Latent Topic Models for Hypertext
课程网址: http://videolectures.net/uai08_gruber_ltm/  
主讲教师: Amit Gruber
开课单位: 以色列希伯来大学
开课时间: 2008-07-30
课程语种: 英语
中文简介:
潜在主题模型已成功用作大型文档集中的无监督主题发现技术。随着超文本文档收集(例如Internet)的激增,人们对将这些方法扩展到超文本也产生了极大的兴趣[6,9]。这些方法通常以与链接模型相似的方式对链接进行建模,文档链接共现矩阵的建模方式与在标准主题模型中对文档单词共现矩阵进行建模的方式相同。在本文中,我们提出了超文本文档集合的概率生成模型,该模型显式地对链接的生成进行建模。具体来说,从单词w到文档d的链接除了d的程度外,还直接取决于w主题在d中的频率。我们展示了如何在此模型上有效地进行EM学习。通过不对链接进行类似于单词的建模,我们最终使用的自由参数要少得多,并且可以获得更好的链接预测结果。
课程简介: Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest in extending these approaches to hypertext [6, 9]. These approaches typically model links in an analogous fashion to how they model words - the document-link co-occurrence matrix is modeled in the same way that the document-word co-occurrence matrix is modeled in standard topic models. In this paper we present a probabilistic generative model for hypertext document collections that explicitly models the generation of links. Specifically, links from a word w to a document d depend directly on how frequent the topic of w is in d, in addition to the in-degree of d. We show how to perform EM learning on this model efficiently. By not modeling links as analogous to words, we end up using far fewer free parameters and obtain better link prediction results.
关 键 词: 潜在主题模型; 无监督主题; 超文本文档; 矩阵建模; 概率生成模型; 自有参数; 链接预测
课程来源: 视频讲座网
最后编审: 2020-06-08:cxin
阅读次数: 61