0


Sempala:Hadoop上的交互式SPARQL查询处理

Sempala: Interactive SPARQL Query Processing on Hadoop
课程网址: http://videolectures.net/iswc2014_schaetzle_sempala/  
主讲教师: Alexander Schätzle
开课单位: 弗莱堡大学
开课时间: 2014-12-19
课程语种: 英语
中文简介:

在 Schema.org 等计划的推动下,语义注释数据的数量预计将稳步增长,以达到大规模,需要基于集群的解决方案来查询它。与此同时,Hadoop 已经在大数据处理领域占据主导地位,大型基础设施已经部署并用于多种应用领域。对于基于 Hadoop 的应用程序,通用数据池 (HDFS) 提供了许多协同优势,这使得将这些基础架构也用于语义数据处理非常有吸引力。事实上,Hadoop 上现有的 SPARQL (MapReduce) 方法已经展示了非常好的可扩展性,但是,由于底层的批处理框架,查询运行时相当慢。虽然这对于数据密集型查询是可以接受的,但对于大多数 SPARQL 查询来说并不令人满意,因为它们通常更具选择性,只需要数据的小子集。在本文中,我们介绍了 Sempala,这是一种 SPARQL over SQL on Hadoop 方法,其设计考虑了选择性查询。我们的评估表明,与现有方法相比,性能提升了一个数量级,为在 Hadoop 上进行交互式时间 SPARQL 查询处理铺平了道路。

课程简介: Driven by initiatives like Schema.org, the amount of semantically annotated data is expected to grow steadily towards massive scale, requiring cluster-based solutions to query it. At the same time, Hadoop has become dominant in the area of Big Data processing with large infrastructures being already deployed and used in manifold application fields. For Hadoop-based applications, a common data pool (HDFS) provides many synergy benefits, making it very attractive to use these infrastructures for semantic data processing as well. Indeed, existing SPARQL-on- Hadoop (MapReduce) approaches have already demonstrated very good scalability, however, query runtimes are rather slow due to the underlying batch processing framework. While this is acceptable for data-intensive queries, it is not satisfactory for the majority of SPARQL queries that are typically much more selective requiring only small subsets of the data. In this paper, we present Sempala, a SPARQL-over-SQL-on-Hadoop approach designed with selective queries in mind. Our evaluation shows performance improvements by an order of magnitude compared to existing approaches, paving the way for interactive-time SPARQL query processing on Hadoop.
关 键 词: 语义注释数据; 数据密集型查询; 大数据处理
课程来源: 视频讲座网
数据采集: 2021-06-27:zyk
最后编审: 2021-06-27:zyk
阅读次数: 60