0


Topicflow模型:对超链接文档主题影响的无监督学习

TopicFlow Model: Unsupervised Learning of Topic-specific Influences of Hyperlinked Documents
课程网址: http://videolectures.net/aistats2011_nallapati_model/  
主讲教师: Ramesh Nallapati
开课单位: 卡内基梅隆大学
开课时间: 2011-05-06
课程语种: 英语
中文简介:
常用的网络数据实体影响建模算法,如PageRank,是通过分析超链接结构来实现的,而忽略了文档的内容。然而,影响力往往是与主题相关的,例如,一个在政治上有很大影响力的网页,在体育上可能是一个未知的实体。我们设计了一个名为TopicFlow的新模型,它结合了网络流和主题建模的思想,以完全不受监督的方式学习超链接文档的主题特定影响的概念。在引文推荐这一获取影响力的实例中,TopicFlow模型与基于TF-IDF的余弦相似度相结合,比几个具有竞争力的基线表现出了11.8%的优势。我们对ACL语料库中模型输出的实证研究表明了其识别具有局部影响的文档的能力。主题流模型在预测两个不同数据集上不可见文本的可能性方面也与最先进的关系主题模型具有竞争力。由于TopicFlow模型具有跨每个超链接学习特定主题流的能力,因此它可以成为跟踪主题在引用网络中扩散的强大可视化工具。
课程简介: Popular algorithms for modeling the influence of entities in networked data, such as PageRank, work by analyzing the hyperlink structure, but ignore the contents of documents. However, often times, influence is topic dependent, e.g., a web page of high influence in politics may be an unknown entity in sports. We design a new model called TopicFlow, which combines ideas from network flow and topic modeling, to learn this notion of topic specific influences of hyperlinked documents in a completely unsupervised fashion. On the task of citation recommendation, which is an instance of capturing influence, the TopicFlow model, when combined with TF-IDF based cosine similarity, outperforms several competitive baselines by as much as 11.8%. Our empirical study of the model’s output on ACL corpus demonstrates its ability to identify topically influential documents. The Topic- Flow model is also competitive with the state-of-theart Relational Topic Models in predicting the likelihood of unseen text on two different data sets. Due to its ability to learn topic-specific flows across each hyperlink, the TopicFlow model can be a powerful visualization tool to track the diffusion of topics across a citation network.
关 键 词: Topicflow模型; 超链接文档主题; 无监督学习
课程来源: 视频讲座网
最后编审: 2020-06-03:王勇彬(课程编辑志愿者)
阅读次数: 190