0


使用笔记本电脑进行大规模基于规则的推理

Large Scale Rule-Based Reasoning Using a Laptop
课程网址: http://videolectures.net/eswc2015_peters_reasoning/  
主讲教师: Martin Peters
开课单位: 多特蒙德学院
开课时间: 2015-07-15
课程语种: 英语
中文简介:
尽管最近的发展表明,可以以可扩展的方式对具有数十亿个三元组的大型RDF数据集进行推理,但对于不断增长的可用语义数据而言,推理过程仍然是一项具有挑战性的任务。目前,能够处理大规模数据集的推理机实现通常使用基于MapReduce的实现,该实现在计算节点集群上运行。在本文中,我们通过识别推理机进程的资源消耗部分来解决这种情况,并为更有效地实现内存消耗提供解决方案。作为基础,我们使用了之前工作中基于规则的推理机概念。具体来说,我们将介绍一种内存高效RETE算法实现方法。此外,我们引入了一种压缩的三元组索引结构,该结构可以用于识别重复的三元组,并且只需要几个字节就可以表示三元组。基于这些概念,我们表明有可能将所有RDFS规则应用于单个笔记本电脑上超过10亿个三元组,达到与基于MapReduce的推理机的最先进水平相当甚至更高的吞吐量。因此,我们表明大规模轻量级推理所需的资源可以大大减少。
课程简介: Although recent developments have shown that it is possible to reason over large RDF datasets with billions of triples in a scalable way, the reasoning process can still be a challenging task with respect to the growing amount of available semantic data. By now, reasoner implementations that are able to process large scale datasets usually use a MapReduce based implementation that runs on a cluster of computing nodes. In this paper we address this circumstance by identifying the resource consuming parts of a reasoner process and providing a solution for a more efficient implementation in terms of memory consumption. As a basis we use a rule-based reasoner concept from our previous work. In detail, we are going to introduce an approach for a memory efficient RETE algorithm implementation. Furthermore, we introduce a compressed triple-index structure that can be used to identify duplicate triples and only needs a few bytes to represent a triple. Based on these concepts we show that it is possible to apply all RDFS rules to more than 1 billion triples on a single laptop reaching a throughput, that is comparable or even higher than state of the art MapReduce based reasoner. Thus, we show that the resources needed for large scale lightweight reasoning can massively be reduced.
关 键 词: 语义数据; 资源消耗; 识别重复
课程来源: 视频讲座网
数据采集: 2023-03-06:chenjy
最后编审: 2023-03-06:chenjy
阅读次数: 26