在大归因图快速尽力而为模式匹配Fast Best-Effort Pattern Matching in Large Attributed Graphs |
|
课程网址: | http://videolectures.net/kdd07_tong_fbep/ |
主讲教师: | Hanghang Tong |
开课单位: | 亚利桑那州立大学 |
开课时间: | 2007-08-14 |
课程语种: | 英语 |
中文简介: | 我们关注节点具有属性的大型图形, 例如一个社交网络, 其中节点使用每个人的职务标记。在这样的设置中, 我们希望找到与用户查询模式匹配的子图。例如, "明星" 查询将是, "找到与经理、律师和会计师或尽可能接近该结构的其他结构有密切互动的首席执行官"。同样, "循环" 查询可以帮助发现一个洗钱团伙。传统的基于 sql 的方法以及最近的图形索引方法在不存在完全匹配时将不返回任何答案。我们的方法可以找到精确的--以及接近匹配的方法, 它将按照我们提出的 "善良" 顺序呈现给用户。例如, 当不存在直接路径时, 我们的方法可以容忍上述示例查询的 "ceo" 和 "会计" 之间的间接路径。它的第二个特点是可伸缩性。一般情况下, 如果查询具有 nq 节点, 而数据图具有 n 个节点, 则问题需要多项式时间复杂度 o (nnq), 这是令人望而却步的。我们的 g-ray ("图 x 射线") 方法在数据图的大小上找到高质量的时间线性子图。在 dlbp 作者-出版图 (具有356k 节点和 1.9 m 边缘) 上的实验结果说明了我们的方法的有效性和可扩展性。结果与我们的直觉一致, 速度非常好。dblp 图上的4节点查询平均需要4秒。 |
课程简介: | We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person’s job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a ‘star’ query would be, “find a CEO who has strong interactions with a Manager, a Lawyer, and an Accountant, or another structure as close to that as possible”. Similarly, a ‘loop’ query could help spot a money laundering ring. Traditional SQL-based methods, as well as more recent graph indexing methods, will return no answer when an exact match does not exist. Our method can find exact-, as well as near-matches, and it will present them to the user in our proposed ‘goodness’ order. For example, our method tolerates indirect paths between, say, the ‘CEO’ and the ‘Accountant’ of the above sample query, when direct paths do not exist. Its second feature is scalability. In general, if the query has nq nodes and the data graph has n nodes, the problem needs polynomial time complexity O(nnq ), which is prohibitive. Our G-Ray (“Graph X-Ray”) method finds high-quality subgraphs in time linear on the size of the data graph. Experimental results on the DLBP author-publication graph (with 356K nodes and 1.9M edges) illustrate both the effectiveness and scalability of our approach. The results agree with our intuition, and the speed is excellent. It takes 4 seconds on average for a 4- node query on the DBLP graph. |
关 键 词: | 计算机科学; 机器学习; 模式识别 |
课程来源: | 视频讲座网 |
最后编审: | 2020-06-24:yumf |
阅读次数: | 48 |