0


WORQ:工作负载驱动的RDF查询处理

WORQ: Workload-Driven RDF Query Processing
课程网址: http://videolectures.net/iswc2018_madkour_woqr_workload_driven/  
主讲教师: Amgad Madkour
开课单位: 普渡大学计算机科学系
开课时间: 2018-11-22
课程语种: 英语
中文简介:
基于云的系统为管理大规模RDF数据提供了一个丰富的平台。然而,这些系统的分布式特性带来了一些性能挑战,例如磁盘I/O和网络洗牌开销,特别是对于涉及多个联接操作的RDF查询。为了缓解这些挑战,本文研究了几种提高RDF查询性能的优化技术的效果。基于查询工作负载,计算特定连接模式常见的减少的中间结果集(简称减少)。此外,这些缩减不是预先计算的,而是使用Bloom过滤器以在线方式仅针对频繁连接模式计算的。与缓存每个查询的最终结果不同,我们表明缓存缩减允许跨共享相同连接模式的多个查询重用中间结果。此外,我们还为具有未绑定属性的RDF查询引入了一种有效的解决方案。基于在Spark之上实现所提出的优化,使用两个合成基准和一个真实数据集进行的大量实验表明,与最先进的解决方案相比,这些优化如何在预处理、存储和查询性能方面提高数量级。
课程简介: Cloud-based systems provide a rich platform for managing large-scale RDF data. However, the distributed nature of these systems introduces several performance challenges, e.g., disk I/O and network shuffling overhead, especially for RDF queries that involve multiple join operations. To alleviate these challenges, this paper studies the effect of several optimization techniques that enhance the performance of RDF queries. Based on the query workload, reduced sets of intermediate results (or reductions, for short) that are common for certain join pattern(s) are computed. Furthermore, these reductions are not computed beforehand, but are rather computed only for the frequent join patterns in an online fashion using Bloom filters. Rather than caching the final results of each query, we show that caching the reductions allows reusing intermediate results across multiple queries that share the same join patterns. In addition, we introduce an efficient solution for RDF queries with unbound properties. Based on a realization of the proposed optimizations on top of Spark, extensive experimentation using two synthetic benchmarks and a real dataset demonstrates how these optimizations lead to an order of magnitude enhancement in terms of preprocessing, storage, and query performance compared to the state-of-the-art solutions.
关 键 词: 基于云的系统; 基于云的系统; 未绑定属性的RDF查询
课程来源: 视频讲座网
数据采集: 2023-01-16:cyh
最后编审: 2023-01-16:cyh
阅读次数: 37