0


文化基因跟踪和消息循环动力学

Meme-tracking and the dynamics of the news cycle
课程网址: http://videolectures.net/kdd09_leskovec_mtatd/  
主讲教师: Jure Leskovec
开课单位: 斯坦福大学
开课时间: 2009-09-14
课程语种: 英语
中文简介:
在网络上跟踪新的主题、想法和备忘录是一个相当有趣的问题。最近的工作已经开发出跟踪长时间尺度上的主题转移以及特定命名实体外观的突然峰值的方法。然而,这些方法不太适合于识别广泛传播的内容,然后随着时间的推移,这些内容会按天的顺序逐渐消失——我们感知新闻和事件的时间尺度。我们开发了一个框架来跟踪短小的、独特的短语,这些短语在在线文本中相对完整地传播;开发了可扩展的算法来聚类这些短语的文本变体,我们确定了一类广泛的模因,这些模因每天都表现出广泛的传播和丰富的变化。作为我们的主要研究领域,我们展示了这种MEME跟踪方法如何能够提供新闻周期的连贯表示——新闻媒体中的日常节奏,长期以来一直是定性解释的主题,但从未被准确捕捉到足以进行实际定量分析。在三个月的时间里,我们跟踪了160万个主流媒体网站和博客,共有9000万篇文章,在新闻周期中我们发现了一系列新颖而持久的时间模式。特别是,我们观察到,在新闻媒体和博客中的一个短语的注意力高峰之间,有一个典型的2.5小时的滞后,在整个峰值周围有不同的行为,在新闻和博客之间的切换中有一个类似“心跳”的模式。我们还开发和分析了系统所显示的时间变化类型的数学模型。
课程简介: Tracking new topics, ideas, and "memes" across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suited to the identification of content that spreads widely and then fades over time scales on the order of days --- the time scale at which we perceive news and events. We develop a framework for tracking short, distinctive phrases that travel relatively intact through on-line text; developing scalable algorithms for clustering textual variants of such phrases, we identify a broad class of memes that exhibit wide spread and rich variation on a daily basis. As our principal domain of study, we show how such a meme-tracking approach can provide a coherent representation of the news cycle --- the daily rhythms in the news media that have long been the subject of qualitative interpretation but have never been captured accurately enough to permit actual quantitative analysis. We tracked 1.6 million mainstream media sites and blogs over a period of three months with the total of 90 million articles and we find a set of novel and persistent temporal patterns in the news cycle. In particular, we observe a typical lag of 2.5 hours between the peaks of attention to a phrase in the news media and in blogs respectively, with divergent behavior around the overall peak and a ``heartbeat''-like pattern in the handoff between news and blogs. We also develop and analyze a mathematical model for the kinds of temporal variation that the system exhibits.
关 键 词: 时间尺度; 跟踪短框架; 定量分析; 数学模型
课程来源: 视频讲座网
最后编审: 2019-12-21:lxf
阅读次数: 49