
Model-based Clustering of Short Text Streams
课程网址: http://videolectures.net/kdd2018_chao_model-based_clustering/  
主讲教师: Daren Chao
开课单位: KTH皇家理工学院计算机科学与通信学院
开课时间: 2018-11-23
课程语种: 英语
课程简介: Short text stream clustering has become an increasingly important problem due to the explosive growth of short text in diverse social medias. In this paper, we propose a model-based short text stream clustering algorithm (MStream) which can deal with the concept drift problem and sparsity problem naturally. The MStream algorithm can achieve state-of-the-art performance with only one pass of the stream, and can have even better performance when we allow multiple iterations of each batch. We further propose an improved algorithm of MStream with forgetting rules called MStreamF, which can efficiently delete outdated documents by deleting clusters of outdated batches. Our extensive experimental study shows that MStream and MStreamF can achieve better performance than three baselines on several real datasets.
关 键 词: 短文本流聚类; 短文本流聚类算法; 遗忘规则的MStream算法; MStream和MStreamF
课程来源: 视频讲座网
数据采集: 2023-01-24:cyh
最后编审: 2023-01-24:cyh
阅读次数: 30