0


校正的信息级联数据丢失

Correcting for Missing Data in Information Cascades
课程网址: http://videolectures.net/wsdm2011_leskovec_cmd/  
主讲教师: Jure Leskovec
开课单位: 斯坦福大学
开课时间: 2011-08-09
课程语种: 英语
中文简介:
传播疾病的传播,信息的传播,以及通过社交网络传播思想和影响都是传播的例子。在这种情况下,我们说传染病通过网络传播,这个过程可以通过级联图建模。由于缺少数据,研究级联和网络扩散具有挑战性。即使在一系列传播事件中的单个缺失观察也可以显着改变我们关于扩散过程的推论。我们解决了信息级联中丢失数据的问题。具体而言,仅给出C&prime的分数;在完整的级联C中,我们的目标是估计完整级联C的属性,例如其大小或深度。为了估计C的性质,我们首先制定级联的k树模型,并在缺失数据的情况下对其性质进行分析研究。然后,我们提出了一种给出级联模型和观察到的级联C&prime的数值方法;可以估算完整级联C的属性。我们使用Twitter网络中的信息传播级联(7000万个节点和20亿个边缘)以及博客圈中出现的信息级联来评估我们的方法。我们的实验表明,k树模型是研究级联中缺失数据影响的有效工具。最重要的是,我们表明我们的方法(和k树模型)可以准确地估计完整级联C的属性,即使有90%的数据丢失。
课程简介: Transmission of infectious diseases, propagation of information, and spread of ideas and influence through social networks are all examples of diffusion. In such cases we say that a contagion spreads through the network, a process that can be modeled by a cascade graph. Studying cascades and network diffusion is challenging due to missing data. Even a single missing observation in a sequence of propagation events can significantly alter our inferences about the diffusion process. We address the problem of missing data in information cascades. Specifically, given only a fraction C′ of the complete cascade C, our goal is to estimate the properties of the complete cascade C, such as its size or depth. To estimate the properties of C, we first formulate k-tree model of cascades and analytically study its properties in the face of missing data. We then propose a numerical method that given a cascade model and observed cascade C′ can estimate properties of the complete cascade C. We evaluate our methodology using information propagation cascades in the Twitter network (70 million nodes and 2 billion edges), as well as information cascades arising in the blogosphere. Our experiments show that the k-tree model is an effective tool to study the effects of missing data in cascades. Most importantly, we show that our method (and the k-tree model) can accurately estimate properties of the complete cascade C even when 90% of the data is missing.
关 键 词: 数据缺失; 信息级联数据; 第k树模型
课程来源: 视频讲座网
最后编审: 2020-06-29:yumf
阅读次数: 56