数据抽样策略如何影响社会媒体中信息扩散的发现?How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media? |
|
课程网址: | http://videolectures.net/icwsm2010_dechoudhury_hdd/ |
主讲教师: | Munmun De Choudhury |
开课单位: | 佐治亚理工学院 |
开课时间: | 2010-06-29 |
课程语种: | 英语 |
中文简介: | Twitter等平台为研究人员提供了充分的机会来分析研究社会现象。然而,由于新信息的大量生成,存在重大的计算挑战:因此,研究人员经常被迫分析明智选择的数据“样本”。与其他社交媒体现象一样,信息传播是一个社交过程 - 除了图形拓扑之外,它还受用户上下文和主题的影响。本文研究了基于不同属性和拓扑的采样策略对重要社交媒体现象 - 信息扩散的发现的影响。我们研究了几种广泛采用的基于属性(随机,位置和活动)和拓扑选择节点的抽样方法(森林火灾)以及研究基于属性的种子选择对基于拓扑的抽样的影响。然后,我们基于用户活动(例如,体积,种子数),拓扑(例如,到达,传播)和时间特征(例如,速率),开发用于评估样本质量的一系列度量。我们还将扩散量度量与两个外部变量 - 搜索和新闻趋势相关联。我们的实验表明,对于小样本量(30%),结合拓扑和用户背景(例如位置,活动)的样本可以在天真方法上提高~15%的显着幅度。 |
课程简介: | Platforms such as Twitter have provided researchers with ample opportunities to analytically study social phenomena. There are however, significant computational challenges due to the enormous rate of production of new information: researchers are therefore, often forced to analyze a judiciously selected “sample” of the data. Like other social media phenomena, information diffusion is a social process–it is affected by user context, and topic, in addition to the graph topology. This paper studies the impact of different attribute and topology based sampling strategies on the discovery of an important social media phenomena–information diffusion. We examine several widely-adopted sampling methods that select nodes based on attribute (random, location, and activity) and topology (forest fire) as well as study the impact of attribute based seed selection on topology based sampling. Then we develop a series of metrics for evaluating the quality of the sample, based on user activity (e.g. volume, number of seeds), topological (e.g. reach, spread) and temporal characteristics (e.g. rate). We additionally correlate the diffusion volume metric with two external variables–search and news trends. Our experiments reveal that for small sample sizes (30%), a sample that incorporates both topology and user context (e.g. location, activity) can improve on naive methods by a significant margin of ~15-20%. |
关 键 词: | 社交媒体; 社交过程; 拓扑 |
课程来源: | 视频讲座网 |
最后编审: | 2019-04-26:lxf |
阅读次数: | 23 |