0


CatchSync:捕捉大型有向图中的同步行为

CatchSync: Catching Synchronized Behavior in Large Directed Graphs
课程网址: http://videolectures.net/kdd2014_jiang_catchsync/  
主讲教师: Meng Jiang
开课单位: 清华大学
开课时间: 2014-10-07
课程语种: 英语
中文简介:

给定一个包含数百万个节点的有向图,我们如何仅根据其连接模式来自动发现异常、可疑的节点?可疑的图模式出现在许多应用程序中,从购买虚假关注者的 Twitter 用户,操纵社交网络,到执行分布式拒绝服务攻击的僵尸网络成员,扰乱网络流量图。我们提出了一种快速有效的方法 CatchSync,它利用了欺诈者在图中留下的两个迹象:(a) 同步行为:可疑节点具有极其相似的行为模式,因为它们通常需要一起执行某些任务(例如关注同一个用户); (b) 罕见行为:他们的连接模式与大多数人非常不同。我们引入了新的措施来量化这两个概念(“同步性”和“正态性”),并且我们提出了一种无参数算法,该算法适用于所产生的同步性正态性图。由于精心设计,CatchSync 具有以下理想特性:(a) 可扩展到大型数据集,在图形大小上呈线性; (b) 它是无参数的; (c) 侧边信息无视:它可以仅使用拓扑进行操作,不需要标记数据,也不需要时间信息等,同时仍然能够使用侧边信息(如果可用)。我们在两个大型真实数据集上应用了 CatchSync,10 亿边缘 Twitter 社交图和 30 亿边缘腾讯微博社交图,以及几个合成数据; CatchSync 在 Twitter 上的检测准确度和腾讯微博上的检测准确度分别提高了 36% 和 20%,而且在速度方面始终优于现有竞争对手。

课程简介: Given a directed graph of millions of nodes, how can we automatically spot anomalous, suspicious nodes, judging only from their connectivity patterns? Suspicious graph patterns show up in many applications, from Twitter users who buy fake followers, manipulating the social network, to botnet members performing distributed denial of service attacks, disturbing the network traffic graph. We propose a fast and effective method, CatchSync, which exploits two of the tell-tale signs left in graphs by fraudsters: (a) synchronized behavior: suspicious nodes have extremely similar behavior pattern, because they are often required to perform some task together (such as follow the same user); and (b) rare behavior: their connectivity patterns are very different from the majority. We introduce novel measures to quantify both concepts ("synchronicity" and "normality") and we propose a parameter-free algorithm that works on the resulting synchronicity-normality plots. Thanks to careful design, CatchSync has the following desirable properties: (a) it is scalable to large datasets, being linear on the graph size; (b) it is parameter free; and (c) it is side-information-oblivious: it can operate using only the topology, without needing labeled data, nor timing information, etc., while still capable of using side information, if available. We applied CatchSync on two large, real datasets 1-billion-edge Twitter social graph and 3-billion-edge Tencent Weibo social graph, and several synthetic ones; CatchSync consistently outperforms existing competitors, both in detection accuracy by 36% on Twitter and 20% on Tencent Weibo, as well as in speed.
关 键 词: 真实数据集; 数据合成; Twitter 社交图
课程来源: 视频讲座网
数据采集: 2021-06-09:zyk
最后编审: 2021-06-09:zyk
阅读次数: 254