0


阅读茶叶:人类如何解读主题模型

Reading Tea Leaves: How Humans Interpret Topic Models
课程网址: http://videolectures.net/nips09_boyd_graber_rtl/  
主讲教师: Jordan Boyd-Graber
开课单位: 马里兰大学
开课时间: 2010-01-19
课程语种: 英语
中文简介:
概率主题模型是用于分析文本数据的常用工具,其中潜在主题表示用于执行模型的定性评估和指导语料库探索。从业者通常认为潜在空间在语义上是有意义的,但这个重要的属性缺乏定量评估。在本文中,我们提出了新的定量方法来测量推断主题中的语义。我们通过大规模的用户研究来支持这些测量,表明它们捕获模型的各个方面,这些方面是基于保持可能性的模型质量测量未被发现的。令人惊讶的是,在保持可能性方面表现更好的主题模型实际上可能推断出语义上更有意义的主题。
课程简介: Probabilistic topic models are a commonly used tool for analyzing text data, where the latent topic representation is used to perform qualitative evaluation of models and guide corpus exploration. Practitioners typically assume that the latent space is semantically meaningful, but this important property has lacked a quantitative evaluation. In this paper, we present new quantitative methods for measuring semantic meaning in inferred topics. We back these measures with large-scale user studies, showing that they capture aspects of the model that are undetected by measures of model quality based on held-out likelihood. Surprisingly, topic models which perform better on held-out likelihood may actually infer less semantically meaningful topics.
关 键 词: 概率; 文本数据; 语义
课程来源: 视频讲座网
最后编审: 2019-09-06:lxf
阅读次数: 54