0


PLSI︰ 真正的费舍尔内核和超越

PLSI: The True Fisher Kernel and Beyond
课程网址: http://videolectures.net/ecmlpkdd09_eckard_plsitfkb/  
主讲教师: Emmanuel Eckard
开课单位: 洛桑联邦理工学院
开课时间: 信息不详。欢迎您在右侧留言补充。
课程语种: 英语
中文简介:
T.Hofmann(1999)提出的概率潜在语义索引模型,在文献分类和信息检索等领域有着广泛的应用。在这种情况下,Fisher内核被发现是一种合适的文档相似性度量。然而,迄今为止发表的内核都含有不合理的特性,其中一些特性阻碍了它们的性能。此外,PLSI不能生成未知的文档,通常通过将它们折叠到PLSI参数空间来弥补这一缺陷。本文通过以下两个方面做出了贡献:(1)介绍了一个新的、严格的PLSI Fisher内核开发,解决了Fisher信息矩阵的作用,揭示了它与目前提出的内核之间的关系;(2)提出了一个新颖的、理论上合理的文档相似性,避免了”折叠&rd的问题。“未知文件。在这两方面,分别对几种信息检索评价集进行了实验研究。
课程简介: The Probabilistic Latent Semantic Indexing model, introduced by T. Hofmann (1999), has engendered applications in numerous fields, notably document classification and information retrieval. In this context, the Fisher kernel was found to be an appropriate document similarity measure. However, the kernels published so far contain unjustified features, some of which hinder their performances. Furthermore, PLSI is not generative for unknown documents, a shortcoming usually remedied by ”folding them in” the PLSI parameter space. This paper contributes on both points by (1) introducing a new, rigorous development of the Fisher kernel for PLSI, addressing the role of the Fisher Information Matrix, and uncovering its relation to the kernels proposed so far; and (2) proposing a novel and theoretically sound document similarity, which avoids the problem of ”folding in” unknown documents. For both aspects, experimental results are provided on several information retrieval evaluation sets.
关 键 词: 费舍尔内核; 文本挖掘; 文档分类; 信息检索
课程来源: 视频讲座网
最后编审: 2019-12-05:cwx
阅读次数: 59