0


凯特:K——文本自动编码器

KATE: K­-Competitive Autoencoder for Text
课程网址: http://videolectures.net/kdd2017_chen_autoencoder_for_text/  
主讲教师: Yu Chen
开课单位: 视频讲座网
开课时间: 2017-10-09
课程语种: 英语
中文简介:
自动编码器已经成功地从图像数据集中学习有意义的表示。然而,它们在文本数据集上的性能还没有得到广泛的研究。传统的自编码器倾向于学习文本文档的可能琐碎的表示,因为它们的混淆属性,如高维、稀疏性和幂律词分布。在本文中,我们提出了一种新的k竞争自动编码器,称为KATE,用于文本文档。由于隐藏层神经元之间的竞争,每个神经元都变得专门识别特定的数据模式,整体上模型可以学习文本数据的有意义的表示。一组全面的实验表明KATE可以比传统的自编码器学习更好的表示,包括去噪、压缩、变分和k稀疏自编码器。在文档分类、回归和检索等下游任务方面,我们的模型也优于深层生成模型、概率主题模型,甚至单词表示模型(例如Word2Vec)。
课程简介: Autoencoders have been successful in learning meaningful representations from image datasets. However, their performance on text datasets has not been widely studied. Traditional autoencoders tend to learn possibly trivial representations of text documents due to their confounding properties such as high-dimensionality, sparsity and power-law word distributions. In this paper, we propose a novel k-competitive autoencoder, called KATE, for text documents. Due to the competition between the neurons in the hidden layer, each neuron becomes specialized in recognizing specific data patterns, and overall the model can learn meaningful representations of textual data. A comprehensive set of experiments show that KATE can learn better representations than traditional autoencoders including denoising, contractive, variational, and k-sparse autoencoders. Our model also outperforms deep generative models, probabilistic topic models, and even word representation models (e.g., Word2Vec) in terms of several downstream tasks such as document classification, regression, and retrieval.
关 键 词: 图像数据; 文本数据; 自编码器
课程来源: 视频讲座网
数据采集: 2022-12-02:chenxin01
最后编审: 2022-12-02:chenxin01
阅读次数: 27