0


多尺度的主题摄影

Multiscale Topic Tomography
课程网址: http://videolectures.net/kdd07_nallapati_mtt/  
主讲教师: Ramesh Nallapati
开课单位: 卡内基梅隆大学
开课时间: 2007-09-14
课程语种: 英语
中文简介:
随着时间的推移对主题的演变进行建模对于大型文档集的自动汇总和分析具有重要价值。在这项工作中,我们提出了一个新的概率图形模型来解决这个问题。新模型,我们称之为多尺度主题断层扫描模型(MTTM),采用非齐次泊松过程来模拟字数的生成。主题的演变通过使用Haar小波的多尺度分析来建模。该模型的一个新功能是在不同时间尺度的分辨率下对主题的演变进行建模,允许用户放大和缩小时间尺度。我们使用新模型对Science数据进行的实验揭示了主题中一些有趣的模式。正如我们的困惑实验所证明的那样,新模型在预测看不见的数据方面也与LDA相当。
课程简介: Modeling the evolution of topics with time is of great value in automatic summarization and analysis of large document collections. In this work, we propose a new probabilistic graphical model to address this issue. The new model, which we call the Multiscale Topic Tomography Model (MTTM), employs non-homogeneous Poisson processes to model generation of word-counts. The evolution of topics is modeled through a multi-scale analysis using Haar wavelets. One of the new features of the model is its modeling the evolution of topics at various time-scales of resolution, allowing the user to zoom in and out of the time-scales. Our experiments on Science data using the new model uncovers some interesting patterns in topics. The new model is also comparable to LDA in predicting unseen data as demonstrated by our perplexity experiments.
关 键 词: 多尺度的话题断层模型; 非齐次泊松过程; 科学数据的实验
课程来源: 视频讲座网
最后编审: 2020-06-01:吴雨秋(课程编辑志愿者)
阅读次数: 37