0


图形采样和保持:大图形分析的框架

Graph Sample and Hold: A Framework for Big-Graph Analytics
课程网址: http://videolectures.net/kdd2014_ahmed_big_graph_analytics/  
主讲教师: Nesreen K.Ahmed
开课单位: 普渡大学
开课时间: 2014-10-07
课程语种: 英语
中文简介:
采样是大图分析的标准方法;其目的是通过参考整个群体的样本来有效地估计图的财产。假设一个完美的样本反映了整个群体的每个属性。不幸的是,这样一个完美的样本很难在复杂的人群中收集,如图表(如网络图、社交网络等),其中底层网络连接着人群的单位。因此,一个好的样本将具有代表性,即感兴趣的图形财产可以以已知的精度进行估计。虽然以前的工作主要关注用于估计某些图形财产的采样方案(例如三角形计数),但当我们需要用相同的采样方案估计各种图形财产时,我们对这种情况知之甚少。在本文中,我们提出了一个用于大图分析的通用流采样框架,称为图采样和保持(gSH)。首先,所提出的框架在保持小状态的同时,一次一条边地从海量图中顺序采样。然后我们展示了如何从样本中为各种图财产生成无偏估计。假设图分析算法将在样本上运行,而不是在整个总体上运行,则这些算法的运行时复杂性保持在控制之下。此外,在图财产的估计是无偏的情况下,逼近误差是可控的。最后,我们展示了所提出的框架(gSH)在各种类型的图(如社交图等)上的性能。
课程简介: Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks etc), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy. While previous work focused particularly on sampling schemes used to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we propose a generic stream sampling framework for big-graph analytics, called Graph Sample and Hold (gSH). To begin, the proposed framework samples from massive graphs sequentially in a single pass, one edge at a time, while maintaining a small state. We then show how to produce unbiased estimators for various graph properties from the sample. Given that the graph analysis algorithms will run on a sample instead of the whole population, the runtime complexity of these algorithm is kept under control. Moreover, given that the estimators of graph properties are unbiased, the approximation error is kept under control. Finally, we show the performance of the proposed framework (gSH) on various types of graphs, such as social graphs, among others.
关 键 词: 大图分析; 无偏估计; 分析算法
课程来源: 视频讲座网
数据采集: 2023-03-04:chenjy
最后编审: 2023-05-11:chenjy
阅读次数: 39