0


基于模型的短文本流聚类

Model-based Clustering of Short Text Streams
课程网址: http://videolectures.net/kdd2018_chao_model-based_clustering/  
主讲教师: Daren Chao
开课单位: KTH皇家理工学院计算机科学与通信学院
开课时间: 2018-11-23
课程语种: 英语
中文简介:
由于短文本在各种社交媒体中的爆炸性增长,短文本流聚类已成为一个日益重要的问题。在本文中,我们提出了一种基于模型的短文本流聚类算法(MStream),它可以自然地处理概念漂移问题和稀疏性问题。MStream算法只需通过一次流就可以实现最先进的性能,并且当我们允许每个批的多次迭代时,它可以具有更好的性能。我们进一步提出了一种改进的具有遗忘规则的MStream算法,称为MStreamF,它可以通过删除过时批的聚类来有效地删除过时文档。我们广泛的实验研究表明,MStream和MStreamF可以在几个真实数据集上实现比三个基线更好的性能。
课程简介: Short text stream clustering has become an increasingly important problem due to the explosive growth of short text in diverse social medias. In this paper, we propose a model-based short text stream clustering algorithm (MStream) which can deal with the concept drift problem and sparsity problem naturally. The MStream algorithm can achieve state-of-the-art performance with only one pass of the stream, and can have even better performance when we allow multiple iterations of each batch. We further propose an improved algorithm of MStream with forgetting rules called MStreamF, which can efficiently delete outdated documents by deleting clusters of outdated batches. Our extensive experimental study shows that MStream and MStreamF can achieve better performance than three baselines on several real datasets.
关 键 词: 短文本流聚类; 短文本流聚类算法; 遗忘规则的MStream算法; MStream和MStreamF
课程来源: 视频讲座网
数据采集: 2023-01-24:cyh
最后编审: 2023-01-24:cyh
阅读次数: 21