0


基于模型解释的对抗检测

Adversarial Detection with Model Interpretation
课程网址: http://videolectures.net/kdd2018_liu_adversarial_interpretation/  
主讲教师: Ninghao Liu
开课单位: 德克萨斯农工大学
开课时间: 2018-11-23
课程语种: 英语
中文简介:
机器学习(ML)系统已越来越多地应用于网络安全应用,如垃圾邮件发送者检测、恶意软件检测和欺诈检测。这些应用程序具有内在的对抗性,智能攻击者可以自适应地改变其行为,以避免被部署的检测器检测到。针对对手的现有努力通常受到应用的ML模型类型或特定应用(如图像分类)的限制。此外,用户通常无法很好地理解ML模型的工作机制,这反过来阻碍了他们理解模型的脆弱性,也无法提高其鲁棒性。为了弥补这一差距,在本文中,我们建议研究模型解释是否可能有助于对抗性检测。具体来说,我们利用ML模型的解释,开发了一种新的抗对手检测框架。解释过程解释了目标ML模型如何对给定实例进行预测的机制,从而为制作对抗性样本提供了更多见解。然后通过对抗样本的对抗性训练来提高检测器的鲁棒性。还开发了一种数据驱动方法,以根据经验估计对手在特征操作中的成本。我们的方法与模型无关,可以应用于各种类型的分类模型。我们在两个真实世界数据集上的实验结果证明了基于解释的攻击的有效性,以及估计的特征操作成本将如何影响对手的行为。
课程简介: Machine learning (ML) systems have been increasingly applied in web security applications such as spammer detection, malware detection and fraud detection. These applications have an intrinsic adversarial nature where intelligent attackers can adaptively change their behaviors to avoid being detected by the deployed detectors. Existing efforts against adversaries are usually limited by the type of applied ML models or the specific applications such as image classification. Additionally, the working mechanisms of ML models usually cannot be well understood by users, which in turn impede them from understanding the vulnerabilities of models nor improving their robustness. To bridge the gap, in this paper, we propose to investigate whether model interpretation could potentially help adversarial detection. Specifically, we develop a novel adversary-resistant detection framework by utilizing the interpretation of ML models. The interpretation process explains the mechanism of how the target ML model makes prediction for a given instance, thus providing more insights for crafting adversarial samples. The robustness of detectors is then improved through adversarial training with the adversarial samples. A data-driven method is also developed to empirically estimate costs of adversaries in feature manipulation. Our approach is model-agnostic and can be applied to various types of classification models. Our experimental results on two real-world datasets demonstrate the effectiveness of interpretation-based attacks and how estimated feature manipulation cost would affect the behavior of adversaries.
关 键 词: 机器学习; 网络安全应用; 检测框架
课程来源: 视频讲座网
数据采集: 2022-12-04:chenjy
最后编审: 2022-12-04:chenjy
阅读次数: 35