分布式RDF存储中数据放置策略对查询工作的影响分析Impact analysis of data placement strategies on query efforts in distributed RDF stores |
|
课程网址: | http://videolectures.net/iswc2018_janke_impact_distributed_rdf/ |
主讲教师: | Daniel Janke |
开课单位: | 科布伦茨-朗道大学网络科学与技术研究所(WeST) |
开课时间: | 2018-11-22 |
课程语种: | 英语 |
中文简介: | 在过去的几年中,已经开发了云中的可扩展RDF存储,其中图形数据分布在计算和存储节点上,用于扩展查询处理和内存需求。这些RDF存储中的一个主要挑战是数据放置策略,该策略可以通过图覆盖形式化。这些图覆盖确定(a)三元组分布在所有存储节点上是否均衡(存储平衡)(b)可以在多个计算节点上并行计算不同的查询结果(垂直并行化),以及(c)只能从分配给少数(理想情况下是一个)存储节点的三元组生成单个查询结果(水平包容)。我们分析了三种最常用的图覆盖策略在这些方面的影响,发现平衡查询工作负载比减少网络上的数据传输更能减少查询执行时间。为此,我们提出了我们新颖的基准和开源评估平台Koral。 |
课程简介: | In the last years, scalable RDF stores in the cloud have been developed, where graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs. One main challenge in these RDF stores is the data placement strategy that can be formalized in terms of graph covers. These graph covers determine whether (a) the triples distribution is well-balanced over all storage nodes (storage balance) (b) different query results may be computed on several compute nodes in parallel (vertical parallelization) and (c) individual query results can be produced only from triples assigned to few — ideally one — storage node (horizontal containment). We analyse the impact of three most commonly used graph cover strategies in these terms and found out that balancing query workload reduces the query execution time more than reducing data transfer over network. To this end, we present our novel benchmark and open source evaluation platform Koral. |
关 键 词: | 云中的可扩展RDF存储; 常用的图覆盖策略; 网络上的数据传输 |
课程来源: | 视频讲座网 |
数据采集: | 2022-12-20:cyh |
最后编审: | 2022-12-23:cyh |
阅读次数: | 8 |