0


已知混合矩阵时随机投影扰动数据的重构

Reconstructing Data Perturbed by Random Projections when the Mixing Matrix is Known
课程网址: http://videolectures.net/ecmlpkdd09_sang_rdprpmmk/  
主讲教师: Yingpeng Sang
开课单位: 阿德莱德大学
开课时间: 2009-10-20
课程语种: 英语
中文简介:

随机投影(RP)由于其高效和安全性,从隐私保护数据挖掘的研究中引起了极大兴趣。它是在引用{Liu}中提出的,其中原始数据集由$ m $个属性组成,然后与尺寸为$ k×m(m> k)$的混合矩阵相乘,该矩阵在期望值上是随机且正交的,然后是$ k发布$系列被扰动的数据以进行挖掘。据我们所知,鉴于RP和一些先验知识,例如攻击者和专家的知识,从攻击者的角度来看,要完成原始数据的重建工作以获取一些敏感信息的工作很少。混合矩阵,原始数据的均值和方差。在原始数据的属性相互独立且稀疏的情况下,可以将重构视为欠定独立分量分析(UICA)的问题,但UICA具有一些置换和缩放的歧义。在本文中,我们提出了一种基于UICA的重建框架以及一些减少歧义的技术。在数据挖掘中,原始数据的属性相互关联而不稀疏的情况也很常见。我们还基于最大后验(MAP)方法为多元高斯分布的典型情况提出了一种重构方法。实验表明,我们的重建方法可以实现较高的回收率,并且优于基于主成分分析(PCA)的重建方法。

课程简介: Random Projection (RP) has drawn great interest from the research of privacy-preserving data mining due to its high efficiency and security. It was proposed in cite{Liu} where the original data set composed of $m$ attributes, is multiplied with a mixing matrix of dimensions $ktimes m (m>k)$ which is random and orthogonal on expectation, and then the $k$ series of perturbed data are released for mining purposes. To our knowledge little work has been done from the view of the attacker, to reconstruct the original data to get some sensitive information, given the data perturbed by RP and some priori knowledge, e.g. the mixing matrix, the means and variances of the original data. In the case that the attributes of the original data are mutually independent and sparse, the reconstruction can be treated as a problem of Underdetermined Independent Component Analysis (UICA), but UICA has some permutation and scaling ambiguities. In this paper we propose a reconstruction framework based on UICA and also some techniques to reduce the ambiguities. The cases that the attributes of the original data are correlated and not sparse are also common in data mining. We also propose a reconstruction method for the typical case of Multivariate Gaussian Distribution, based on the method of Maximum A Posterior (MAP). Our experiments show that our reconstructions can achieve high recovery rates, and outperform the reconstructions based on Principle Component Analysis (PCA).
关 键 词: 随机投影; 混合矩阵; 重建方法
课程来源: 视频讲座网
数据采集: 2021-03-30:nkq
最后编审: 2021-09-20:zyk
阅读次数: 37