通过协调全球和本地上下文来协作改进主题发现和词嵌入Collaboratively Improving Topic Discovery and Word Embeddings by Coordinating Global and Local Contexts |
|
课程网址: | http://videolectures.net/kdd2017_xun_contexts/ |
主讲教师: | Guangxu Xun |
开课单位: | 布法罗大学 |
开课时间: | 2017-10-09 |
课程语种: | 英语 |
中文简介: | 文本语料库通常包含两种类型的上下文信息——全局上下文和局部上下文。全局上下文携带主题信息,主题模型可以利用主题信息从文本语料库中发现主题结构,而局部上下文可以训练词嵌入以捕获文本语料库中反映的语义规律。这鼓励我们利用全局和局部上下文信息中的有用信息。在本文中,我们提出了一种基于矩阵分解技术的统一语言模型,该模型1)同时考虑互补的全局和局部上下文信息,2)对主题进行建模并协作学习词嵌入。我们的经验表明,通过结合全球和本地背景,这种协作模型不仅可以比基线主题模型显着提高主题发现的性能,而且可以比基线词嵌入模型学习更好的词嵌入。我们还提供定性分析,解释全球和本地上下文信息的合作如何产生更好的主题结构和词嵌入。 |
课程简介: | A text corpus typically contains two types of context information -- global context and local context. Global context carries topical information which can be utilized by topic models to discover topic structures from the text corpus, while local context can train word embeddings to capture semantic regularities reflected in the text corpus. This encourages us to exploit the useful information in both the global and the local context information. In this paper, we propose a unified language model based on matrix factorization techniques which 1) takes the complementary global and local context information into consideration simultaneously, and 2) models topics and learns word embeddings collaboratively. We empirically show that by incorporating both global and local context, this collaborative model can not only significantly improve the performance of topic discovery over the baseline topic models, but also learn better word embeddings than the baseline word embedding models. We also provide qualitative analysis that explains how the cooperation of global and local context information can result in better topic structures and word embeddings. |
关 键 词: | 词嵌入; 文本语料库; 数据科学 |
课程来源: | 视频讲座网 |
数据采集: | 2023-12-27:wujk |
最后编审: | 2023-12-27:wujk |
阅读次数: | 16 |