首页计算机应用
   首页数学
0


通过稳定性论证选择差分私有特征,以及套索的鲁棒性

Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso
课程网址: http://videolectures.net/colt2013_guha_thakurta_lasso/  
主讲教师: Abhradeep Guha Thakurta
开课单位: 微软公司
开课时间: 2013-08-09
课程语种: 英语
中文简介:
我们为统计模型选择设计了差异私有算法。给定一个数据集和一个大的,离散的“模型”集合,每个模型都是一个概率分布族,目标是确定最适合数据的模型。这是统计和机器学习的许多领域的基本问题。在以下意义上,我们考虑具有明确定义的答案的设置:假设存在非私有模型选择过程f,其是我们比较我们的性能的参考。当f在输入数据集D上稳定时,我们的差分私有算法输出正确的值f(D)。我们使用两个概念,扰动稳定性和子采样稳定性。我们给出两类结果:泛型结果,适用于具有离散输出集的任何函数;和稀疏线性回归问题的具体算法。我们描述的算法是有效的,并且在某些情况下匹配最优的非私有渐近样本复杂度。我们的稀疏线性回归算法需要分析流行的LASSO估计的稳定性。我们给出了LASSO估计量对数据集中的微小变化具有鲁棒性的充分条件,并且表明这些条件在文献中用于分析LASSO收敛的基本相同的随机假设下具有高概率。我们为统计模型选择设计了差异私有算法。给定一个数据集和一个大的,离散的“模型”集合,每个模型都是一个概率分布族,目标是确定最适合数据的模型。这是统计和机器学习的许多领域的基本问题。在以下意义上,我们考虑具有明确定义的答案的设置:假设存在非私有模型选择过程f,其是我们比较我们的性能的参考。当f在输入数据集D上稳定时,我们的差分私有算法输出正确的值f(D)。我们使用两个概念,扰动稳定性和子采样稳定性。我们给出两类结果:泛型结果,适用于具有离散输出集的任何函数;和稀疏线性回归问题的具体算法。我们描述的算法是有效的,并且在某些情况下匹配最优非私有渐近样本复杂度。我们的稀疏线性回归算法需要分析流行的LASSO估计的稳定性。我们给出了LASSO估计量对数据集中的微小变化具有鲁棒性的充分条件,并且表明这些条件在文献中用于分析LASSO收敛的基本相同的随机假设下具有高概率。
课程简介: We design differentially private algorithms for statistical model selection. Given a data set and a large, discrete collection of “models”, each of which is a family of probability distributions, the goal is to determine the model that best “fits” the data. This is a basic problem in many areas of statistics and machine learning. We consider settings in which there is a well-defined answer, in the following sense: Suppose that there is a nonprivate model selection procedure f, which is the reference to which we compare our performance. Our differentially private algorithms output the correct value f(D) whenever f is stable on the input data set D. We work with two notions, perturbation stability and sub-sampling stability. We give two classes of results: generic ones, that apply to any function with discrete output set; and specific algorithms for the problem of sparse linear regression. The algorithms we describe are efficient and in some cases match the optimal non-private asymptotic sample complexity. Our algorithms for sparse linear regression require analyzing the stability properties of the popular LASSO estimator. We give sufficient conditions for the LASSO estimator to be robust to small changes in the data set, and show that these conditions hold with high probability under essentially the same stochastic assumptions that are used in the literature to analyze convergence of the LASSO.
关 键 词: 统计模型; 差异私有算法; 数据集; 离散模型集合; 概率分布; 渐进样本复杂度; 稀疏线性回归问题
课程来源: 视频讲座网公开课
最后编审: 2020-06-08:吴雨秋(课程编辑志愿者)
阅读次数: 80