0


从RDF数据学习关系贝叶斯分类器

Learning relational bayesian classifiers from RDF data
课程网址: http://videolectures.net/iswc2011_lin_rdfdata/  
主讲教师: Harris Lin
开课单位: 爱荷华州立大学
开课时间: 2011-11-25
课程语种: 英语
中文简介:
大型RDF数据集的日益增加的可用性提供了使用机器学习算法来使用这些数据来构建预测模型的令人鼓舞的机会。然而,RDF数据的大规模和分布式特征要求在仅通过查询接口(例如,RDF存储的SPARQL端点)访问数据的设置中学习RDF数据的方法。在数据经常更新的应用中,需要允许预测模型响应于数据中的变化而逐步更新的算法。此外,在一些应用中,与特定预测任务相关的属性不是先验已知的,并且需要由算法发现。我们提出了一种从RDF数据中学习关系贝叶斯分类器(RBC)的方法,用于解决这种情况。具体而言,我们将展示如何使用RDF存储的SPARQL端点的统计查询从RDF数据构建RBC。我们将算法的通信复杂性与需要直接集中访问数据的通信复杂性进行比较,从而从远程位置检索整个RDF数据集以进行处理。我们建立了RBC模型可以响应添加或删除而逐步更新的条件RDFdata。我们展示了如何通过选择性地抓取感兴趣的属性的RDF数据,将我们的方法扩展到与预测相关的属性不是先验已知的设置。我们提供开源实现并评估几个大型RDF数据集的建议方法。
课程简介: The increasing availability of large RDF datasets o ffers an exciting opportunity to use such data to build predictive models using machine learning algorithms. However, the massive size and distributed nature of RDF data calls for approaches to learning from RDF data in a setting where the data can be accessed only through a query interface, e.g., the SPARQL endpoint of the RDF store. In applications where the data are subject to frequent updates, there is a need for algorithms that allow the predictive model to be incrementally updated in response to changes in the data. Furthermore, in some applications, the attributes that are relevant for specifi c prediction tasks are not known a priori and hence need to be discovered by the algorithm. We present an approach to learning Relational Bayesian Classiffi ers (RBCs) from RDF data that addresses such scenarios. Specifi cally, we show how to build RBCs from RDF data using statistical queries through the SPARQL endpoint of the RDF store. We compare the communication complexity of our algorithm with one that requires direct centralized access to the data and hence has to retrieve the entire RDF dataset from the remote location for processing. We establish the conditions under which the RBC models can be incrementally updated in response to addition or deletion of RDF data. We show how our approach can be extended to the setting where the attributes that are relevant for prediction are not known a priori, by selectively crawling the RDF data for attributes of interest. We provide open source implementation and evaluate the proposed approach on several large RDF datasets.
关 键 词: RDF数据集; 机器学习; 贝叶斯分类器
课程来源: 视频讲座网
最后编审: 2019-05-05:lxf
阅读次数: 42