0


并行结构图聚类

Parallel Structural Graph Clustering
课程网址: http://videolectures.net/ecmlpkdd2011_seeland_clustering/  
主讲教师: Madeleine Seeland
开课单位: 慕尼黑工业大学
开课时间: 信息不详。欢迎您在右侧留言补充。
课程语种: 英语
中文简介:
我们根据集群成员之间共享的架构(即大型结构重叠)来解决大型图形数据库的集群问题。在之前的工作中,我们提出了一种在线算法,用于产生重叠(非不相交)和非因果聚类。本文利用高性能的并行硬件对该算法进行了并行化,进一步改进了算法的三个方面:一是基于图集抽象的细化聚类隶属度测试,二是根据图的大小对图进行排序,一是避免了聚类隶属度测试,二是聚类表示的定义。一旦集群支架是唯一的,避免与所有集群成员进行集群比较。在一个大型化学结构数据库的实验中,我们发现,对于以前的工作中使用的一个参数设置,运行时间可以大大缩短。对于较难的参数设置,与之前的10000个结构相比,300000个结构可以在合理时间内获得结果。这表明,基于结构的、基于支架的、用于虚拟筛选的小型库集群已经是可行的。
课程简介: We address the problem of clustering large graph databases according to scaffolds (i.e., large structural overlaps) that are shared between cluster members. In previous work, an online algorithm was proposed for this task that produces overlapping (non-disjoint) and nonexhaustive clusterings. In this paper, we parallelize this algorithm to take advantage of high-performance parallel hardware and further improve the algorithm in three ways: a refined cluster membership test based on a set abstraction of graphs, sorting graphs according to size, to avoid cluster membership tests in the first place, and the definition of a cluster representative once the cluster scaffold is unique, to avoid cluster comparisons with all cluster members. In experiments on a large database of chemical structures, we show that running times can be reduced by a large factor for one parameter setting used in previous work. For harder parameter settings, it was possible to obtain results within reasonable time for 300,000 structures, compared to 10,000 structures in previous work. This shows that structural, scaffold-based clustering of smaller libraries for virtual screening is already feasible.
关 键 词: 网络分析; 聚类分析; 计算机科学; 机器学习; 无监督学习
课程来源: 视频讲座网
最后编审: 2019-12-05:cwx
阅读次数: 43