0


用马尔可夫链蒙特卡罗方法进行代表性子抽样

Representative Subgraph Sampling using Markov Chain Monte Carlo Methods
课程网址: http://videolectures.net/mlg08_borgwardt_rss/  
主讲教师: Karsten Michael Borgwardt
开课单位: 马克斯普朗克研究所
开课时间: 2008-08-25
课程语种: 英语
中文简介:
生物信息学和互联网不断地生成带有数千个节点的图形数据。大多数用于数据分析的传统图算法对于分析这些大型图来说都太慢了。解决这个问题的一种方法是从原始大图中抽取一个较小的代表性子图。现有的代表性子图抽样算法要么随机选择节点集或边,要么探索随机抽取节点的附近。所有这些现有的方法都没有利用原始图的拓扑性质,而是提供了很好的样本,样本大小约为原始图中节点数量的15%。本文在大都会算法和模拟退火算法的基础上,提出了一种新的代表性子图抽样方法。关键的思想是找到一个子图,它保留了原始图的属性,这些属性对于计算或近似都是有效的。在我们的实验中,我们改进了Leskovec和Falofs的开创性工作(kdd 2006),通过生产代表性的子图样本,这些子图样本比文献中其他方法生产的样本更小,质量更高。
课程简介: Bioinformatics and the Internet keep generating graph data with thousands of nodes. Most traditional graph algorithms for data analysis are too slow for analyzing these large graphs. One way to work around this problem is to sample a smaller ‘representative subgraph’ from the original large graph. Existing representative subgraph sampling algorithms either randomly select sets of nodes or edges, or they explore the vicinity of a randomly drawn node. All these existing approaches do not make use of topological properties of the original graph and provide good samples down to sample sizes of approximately 15% of the number of nodes in the original graph. In this article, we propose novel sampling methods for representative subgraph sampling, based on the Metropolis algorithm and Simulated Annealing. The key idea is to find a subgraph that preserves properties of the original graph that are efficient to compute or to approximate. In our experiments, we improve over the pioneering work of Leskovec and Faloutsos (KDD 2006), by producing representative subgraph samples that are both smaller and of higher quality than those produced by other methods from the literature.
关 键 词: 生物信息学; 拓扑性质; 子采样; 模拟退火算法
课程来源: 视频讲座网
最后编审: 2020-05-29:吴雨秋(课程编辑志愿者)
阅读次数: 46