0


鲁棒主成分分析和协同过滤:剔除异常值,识别操纵器

Robust PCA and Collaborative Filtering: Rejecting Outliers, Identifying Manipulators
课程网址: http://videolectures.net/nipsworkshops2010_caramanis_rcf/  
主讲教师: Constantine Caramanis
开课单位: 德克萨斯大学
开课时间: 2011-01-13
课程语种: 英语
中文简介:
主成分分析是一种应用最广泛的降维方法。尽管如此,它仍然受到异常值敏感性的困扰;找到健壮的类似物,特别是对于高维数据,是至关重要的。我们讨论了高维环境所带来的挑战,在高维环境中,维数与样本数的顺序相同或更大。我们详细说明了现有技术失败的原因——事实上,没有已知的算法能够为任何异常值的常量部分提供可证明的边界——然后提出了两种非常不同的高维鲁棒PCA算法。我们的第一个算法达到了50%的崩溃点——这是使用任何算法都可能达到的最好的崩溃点,并且比之前的0%的最著名结果有了明显的改进。第二种算法是基于凸优化思想,除了恢复主成分外,还能够识别出破坏点。我们将此扩展到部分观察到的设置,显著地将矩阵完成结果扩展到损坏的行或列的设置。
课程简介: Principal Component Analysis is one of the most widely used techniques for dimensionality reduction. Nevertheless, it is plagued by sensitivity to outliers; finding robust analogs, particularly for high-dimensional data, is critical. We discuss the challenges posed by the high dimensional setting, where dimensionality is of the same order, or greater, than the number of samples. We detail why existing techniques fail -- indeed, no known algorithm can provide provable bounds to any constant fraction of outliers -- and then present two very different algorithms for High Dimensional Robust PCA. Our first algorithm achieves a breakdown point of 50% -- the best possible using any algorithm, and a stark improvement from the previous best-known result of 0%. Our second algorithm is based on ideas from convex optimization, and in addition to recovering the principal components, is also able to identify the corrupted points. We extend this to the partially observed setting, significantly extending matrix completion results to the setting of corrupted rows or columns.
关 键 词: 主成分分析; 异常值; 高维数据; 凸优化; 扩展矩阵
课程来源: 视频讲座网
最后编审: 2020-06-02:毛岱琦(课程编辑志愿者)
阅读次数: 58