0


真实世界标记网络的q-Grams节点相似性

Node Similarity with q-Grams for Real-World Labeled Networks
课程网址: http://videolectures.net/kdd2018_conte_similarity_q_grams_network...  
主讲教师: Alessio Conte
开课单位: 比萨大学
开课时间: 2018-11-23
课程语种: 英语
中文简介:
我们使用在通往节点的有界长度q的路径中发现的标签序列来研究标记网络中的节点相似性。(这让人想起了基于Jaccard距离的文档相似性中使用的q图。)当应用于网络时,挑战是双重的:从标记路径生成的q图的数量随着q呈指数增长,并且应该考虑它们的频率:这导致了Jaccard指数的变化,即多集的Bray-Curtis指数。我们描述了nSimGram,这是一套用于与q-gram节点相似性的快速算法,基于颜色编码、概率计数、草图和字符串算法的新混合,其中要采样的元素的范围是指数的。我们提供了实验证据,证明我们的措施是有效的,并且我们的运行时间可以扩展到处理大型现实世界网络。
课程简介: We study node similarity in labeled networks, using the label sequences found in paths of bounded length q leading to the nodes. (This recalls the q-grams employed in document resemblance, based on the Jaccard distance.) When applied to networks, the challenge is two-fold: the number of q-grams generated from labeled paths grows exponentially with q, and their frequency should be taken into account: this leads to a variation of the Jaccard index known as Bray-Curtis index for multisets. We describe nSimGram, a suite of fast algorithms for node similarity with q-grams, based on a novel blend of color coding, probabilistic counting, sketches, and string algorithms, where the universe of elements to sample is exponential. We provide experimental evidence that our measure is effective and our running times scale to deal with large real-world networks.
关 键 词: 节点的有界长度q; 真实世界标记网络; q-Grams节点相似性
课程来源: 视频讲座网
数据采集: 2023-03-23:cyh
最后编审: 2023-03-23:cyh
阅读次数: 49