0


在线媒体的时间变化模式

Patterns of Temporal Variation in Online Media
课程网址: http://videolectures.net/wsdm2011_yang_tvo/  
主讲教师: Jaewon Yang
开课单位: 斯坦福大学
开课时间: 信息不详。欢迎您在右侧留言补充。
课程语种: 英语
中文简介:
在线内容呈现出丰富的时间动态,而用户生成的各种实时内容进一步强化了这一过程。然而,随着时间的推移,在线内容的增长和衰落以及不同内容争夺注意力的时间模式仍未被充分研究。我们研究与在线内容相关的时间模式,以及内容的受欢迎程度如何随着时间的推移而增长和消失。内容在Web上受到的关注因许多因素而异,并且在不同的时间尺度和不同的分辨率上发生。为了揭示在线内容的时间动态,我们提出了一个时间序列聚类问题,该问题使用了对缩放和移动不变量的相似性度量。我们开发了k谱质心(K-SC)聚类算法,利用我们的相似性度量有效地找到了聚类质心。将一种基于小波的自适应增量聚类方法应用于聚类,将K-SC扩展到大数据集。我们在两个巨大的数据集上展示了我们的方法:一组5.8亿条tweet,一组1.7亿条博客和新闻媒体文章。我们发现K-SC在寻找不同的时间序列形状方面优于K-means聚类算法。我们的分析表明,网络内容关注的时间形态主要有六种。我们还提出了一个简单的模型,通过使用只有少数参与者的信息来可靠地预测注意力的形状。我们的分析提供了对网络内容的常见时间模式的洞察,并拓宽了对人类注意力动力的理解。
课程简介: Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored. We study temporal patterns associated with online content and how the content’s popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive wavelet-based incremental approach to clustering, we scale K-SC to large data sets. We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that K-SC outperforms the K-means clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on theWeb and broaden the understanding of the dynamics of human attention.
关 键 词: 计算机科学; 网页挖掘; 聚类
课程来源: 视频讲座网
最后编审: 2019-10-30:cwx
阅读次数: 51