
On the Tradeoff Between Privacy and Utility in Data Publishing
课程网址: http://videolectures.net/kdd09_li_otbpud/  
主讲教师: Tiancheng Li
开课单位: 普渡大学
开课时间: 2009-09-14
课程语种: 英语
在数据发布中, 匿名化技术 (如泛化和交叉化) 被设计为提供隐私保护。同时, 它们减少了数据的效用。重要的是要考虑隐私和效用之间的权衡。在 kdd 2008 发表的一篇论文中, brickell 和 shmatikov 提出了一种评估方法, 将隐私收益与匿名数据产生的效用收益进行比较, 并得出结论认为, 即使是适度的隐私收益也几乎需要完全销毁数据挖掘实用程序的 "。这一结论似乎破坏了现有的数据匿名工作。本文分析了隐私权和效用的基本特征, 表明直接比较隐私与效用是不合适的。然后, 我们观察到数据发布中的隐私效用权衡类似于金融投资中的风险回报权衡, 并提出了一个综合框架, 用于考虑公用事业交易, 从现代投资组合中借用概念金融投资理论。最后, 我们从 uci 机器学习存储库中评估了成人数据集的方法。我们的研究结果澄清了对数据实用程序的几个常见误解, 并为数据发布者提供了在隐私和实用程序之间选择正确权衡的有用指南。
课程简介: In data publishing, anonymization techniques such as generalization and bucketization have been designed to provide privacy protection. In the meanwhile, they reduce the utility of the data. It is important to consider the tradeoff between privacy and utility. In a paper that appeared in KDD 2008, Brickell and Shmatikov proposed an evaluation methodology by comparing privacy gain with utility gain resulted from anonymizing the data, and concluded that "even modest privacy gains require almost complete destruction of the data-mining utility". This conclusion seems to undermine existing work on data anonymization. In this paper, we analyze the fundamental characteristics of privacy and utility, and show that it is inappropriate to directly compare privacy with utility. We then observe that the privacy-utility tradeoff in data publishing is similar to the risk-return tradeoff in financial investment, and propose an integrated framework for considering privacy-utility tradeoff, borrowing concepts from the Modern Portfolio Theory for financial investment. Finally, we evaluate our methodology on the Adult dataset from the UCI machine learning repository. Our results clarify several common misconceptions about data utility and provide data publishers useful guidelines on choosing the right tradeoff between privacy and utility.
关 键 词: 计算机科学; 机器学习; 半监督学习
课程来源: 视频讲座网
最后编审: 2020-06-13:邬启凡(课程编辑志愿者)
阅读次数: 71