Aligraph:一个综合图神经网络平台Aligraph: A Comprehensive Graph Neural Network Platform |
|
课程网址: | http://videolectures.net/kdd2019_yang_network_platform/ |
主讲教师: | Hongxia Yang |
开课单位: | 阿里巴巴集团 |
开课时间: | 2020-03-02 |
课程语种: | 英语 |
中文简介: | 越来越多的机器学习任务需要处理大型图形数据集,这些数据集捕获了数十亿元素之间丰富而复杂的关系。图形神经网络(GNN)通过将图形数据转换为低维空间,同时最大限度地保留结构和属性信息,并构造用于训练和参考的神经网络,成为解决图形学习问题的有效方法。然而,提供有效的图形存储和计算能力以促进GNN培训和开发新的GNN算法是一个挑战。本文提出了一个综合的图形神经网络平台AliGraph,它由分布式图形存储、优化采样算子和运行时组成,不仅可以有效地支持现有流行的GNN,还可以有效地为不同场景提供一系列内部开发的GNN。该系统目前部署在阿里巴巴,以支持各种业务场景,包括阿里巴巴电子商务平台上的产品推荐和个性化搜索。通过对拥有4.929亿个顶点、68.2亿条边和丰富属性的真实数据集进行大量实验,AliGraph在图形构建方面的速度提高了一个数量级(据最先进的PowerGraph平台报告,5分钟比数小时)。在培训中,AliGraph使用新的缓存策略运行速度提高了40%-50%,并展示了改进后的运行时速度提高了约12倍。此外,我们内部开发的GNN模型都显示了其在有效性和效率方面的统计显著优势(例如,F1分数提升4.12%-17.19%)。 |
课程简介: | An increasing number of machine learning tasks require dealing with large graph datasets, which capture rich and complex relationship among potentially billions of elements. Graph Neural Network (GNN) becomes an effective way to address the graph learning problem by converting the graph data into a low dimensional space while keeping both the structural and property information to the maximum extent and constructing a neural network for training and referencing. However, it is challenging to provide an efficient graph storage and computation capabilities to facilitate GNN training and enable development of new GNN algorithms. In this paper, we present a comprehensive graph neural network platform, namelyAliGraph, which consists of distributed graph storage, optimized sampling operators and runtime to efficiently support not only existing popular GNNs but also a series of in-house developed ones for different scenarios. The system is currently deployed at Alibaba to support a variety of business scenarios, including product recommendation and personalized search at Alibaba’s E-Commerce platform. By conducting extensive experiments on a real-world dataset with 492.90 million vertices, 6.82 billion edges and rich attributes, AliGraph performs an order of magnitude faster in terms of graph building (5 minutes vs hours reported from the state-of-the-art PowerGraph platform). At training, AliGraph runs 40%-50% faster with the novel caching strategy and demonstrates around 12 times speed up with the improved runtime. In addition, our in-house developed GNN models all showcase their statistically significant superiorities in terms of both effectiveness and efficiency (e.g., 4.12%–17.19% lift by F1 scores). |
关 键 词: | 大数据科学; Aligraph; 综合图神经网络平台; 大型图形数据集 |
课程来源: | 视频讲座网 |
数据采集: | 2022-09-15:cyh |
最后编审: | 2022-09-19:cyh |
阅读次数: | 56 |