0


从词嵌入到文档距离

From Word Embeddings To Document Distances
课程网址: http://videolectures.net/icml2015_kusner_document_distances/  
主讲教师: Matt J. Kusner
开课单位: 康奈尔大学
开课时间: 2015-12-05
课程语种: 英语
中文简介:
我们提出了词移动器距离(WMD),这是文本文档之间的一种新颖的距离函数。我们的工作基于词嵌入的最新结果,该结果从句子中的局部共现中学习单词的语义有意义的表示。WMD 距离衡量两个文本文档之间的差异,作为一个文档的嵌入单词需要“行进”才能到达另一文档的嵌入单词的最小距离。我们证明,这个距离度量可以作为地球移动器距离的一个实例,这是一个经过充分研究的运输问题,已经开发了几个高效的求解器。我们的指标没有超参数,并且可以直接实施。此外,我们在八个现实世界文档分类数据集上进行了演示,并与七个最先进的基线进行了比较,
课程简介: We present the Word Mover’s Distance (WMD), a novel distance function between text documents. Our work is based on recent results in word embeddings that learn semantically meaningful representations for words from local co-occurrences in sentences. The WMD distance measures the dissimilarity between two text documents as the minimum amount of distance that the embedded words of one document need to “travel” to reach the embedded words of another document. We show that this distance metric can be cast as an instance of the Earth Mover’s Distance, a well studied transportation problem for which several highly efficient solvers have been developed. Our metric has no hyperparameters and is straight-forward to implement. Further, we demonstrate on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the WMD metric leads to unprecedented low k-nearest neighbor document classification error rates.
关 键 词: 词嵌入; 机器学习; 文档距离
课程来源: 视频讲座网
数据采集: 2023-12-25:wujk
最后编审: 2023-12-25:wujk
阅读次数: 9