0


混合矩阵已知时随机投影扰动数据的重建

Reconstructing Data Perturbed by Random Projections when the Mixing Matrix is Known
课程网址: http://videolectures.net/ecmlpkdd09_sang_rdprpmmk/  
主讲教师: Yingpeng Sang
开课单位: 阿德莱德大学
开课时间: 2009-10-20
课程语种: 英语
中文简介:
随机投影(RP)由于其高效性和安全性而引起了隐私保护数据挖掘研究的极大兴趣。在\ cite {Liu}中提出了由$ m $属性组成的原始数据集,其乘以维度$ k \ times m~(m> k)$的混合矩阵,其是随机的并且在期望上是正交的,并且然后发布$ k $系列扰动数据用于挖掘目的。据我们所知,从攻击者的角度来看,很少有工作要重建原始数据以获得一些敏感信息,因为RP的数据和一些先验知识,如混合矩阵,原始数据的均值和方差。在原始数据的属性相互独立和稀疏的情况下,重建可以被视为欠定独立分量分析(UICA)的问题,但是UICA具有一些排列和缩放模糊。在本文中,我们提出了一个基于UICA的重建框架,以及一些减少模糊性的技术。原始数据的属性相关且不稀疏的情况在数据挖掘中也很常见。我们还提出了一种基于最大后验(MAP)方法的多元高斯分布典型情形的重建方法。我们的实验表明,我们的重建可以实现高回收率,并且优于基于主成分分析(PCA)的重建。
课程简介: Random Projection (RP) has drawn great interest from the research of privacy-preserving data mining due to its high efficiency and security. It was proposed in \cite{Liu} where the original data set composed of $m$ attributes, is multiplied with a mixing matrix of dimensions $k\times m ~(m>k)$ which is random and orthogonal on expectation, and then the $k$ series of perturbed data are released for mining purposes. To our knowledge little work has been done from the view of the attacker, to reconstruct the original data to get some sensitive information, given the data perturbed by RP and some priori knowledge, e.g. the mixing matrix, the means and variances of the original data. In the case that the attributes of the original data are mutually independent and sparse, the reconstruction can be treated as a problem of Underdetermined Independent Component Analysis (UICA), but UICA has some permutation and scaling ambiguities. In this paper we propose a reconstruction framework based on UICA and also some techniques to reduce the ambiguities. The cases that the attributes of the original data are correlated and not sparse are also common in data mining. We also propose a reconstruction method for the typical case of Multivariate Gaussian Distribution, based on the method of Maximum A Posterior (MAP). Our experiments show that our reconstructions can achieve high recovery rates, and outperform the reconstructions based on Principle Component Analysis (PCA).
关 键 词: 随机投影; 隐私保护; 数据挖掘
课程来源: 视频讲座网
最后编审: 2019-03-27:lxf
阅读次数: 55