用于数据挖掘的资源描述框架图嵌入RDF2Vec: RDF Graph Embeddings for Data Mining |
|
课程网址: | http://videolectures.net/iswc2016_ristoski_rdf_graph/ |
主讲教师: | Petar Ristoski |
开课单位: | 曼海姆大学商业信息学与数学学院 |
开课时间: | 2016-11-10 |
课程语种: | 英语 |
中文简介: | 关联开放数据已被认为是数据挖掘中有价值的背景信息来源。然而,大多数数据挖掘工具需要命题形式的特征,即与实例相关的标称或数值特征的向量,而链接开放数据源本质上是图形。在本文中,我们提出了RDF2Vec,这是一种使用语言建模方法从单词序列中进行无监督特征提取,并将其适应资源描述框架图的方法。我们通过利用来自图子结构的局部信息生成序列,这些信息由Weisfeiler-Lehman子树RDF图核和图遍历获得,并学习资源描述框架图中实体的潜在数值表示。我们的评估表明,这种向量表示在各种不同的预测机器学习任务上优于现有的资源描述框架图的命题化技术,并且一般知识图(如DBpedia和Wikidata)的特征向量表示可以很容易地用于不同的任务。 |
课程简介: | Linked Open Data has been recognized as a valuable source for background information in data mining. However, most data mining tools require features in propositional form, i.e., a vector of nominal or numerical features associated with an instance, while Linked Open Data sources are graphs by nature. In this paper, we present RDF2Vec, an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs. We generate sequences by leveraging local information from graph sub-structures, harvested by Weisfeiler-Lehman Subtree RDF Graph Kernels and graph walks, and learn latent numerical representations of entities in RDF graphs. Our evaluation shows that such vector representations outperform existing techniques for the propositionalization of RDF graphs on a variety of different predictive machine learning tasks, and that feature vector representations of general knowledge graphs such as DBpedia and Wikidata can be easily reused for different tasks. |
关 键 词: | 数据挖掘; 关联数据; 数值特征; 语言建模 |
课程来源: | 视频讲座网 |
数据采集: | 2023-04-24:chenxin01 |
最后编审: | 2023-05-18:chenxin01 |
阅读次数: | 32 |