0


鲁棒PCA与协同过滤:剔除异常值,识别机械手

Robust PCA and Collaborative Filtering: Rejecting Outliers, Identifying Manipulators
课程网址: http://videolectures.net/nipsworkshops2010_caramanis_rcf/  
主讲教师: Constantine Caramanis
开课单位: 德克萨斯大学
开课时间: 2011-06-13
课程语种: 英语
中文简介:
主成分分析是应用最广泛的降维技术之一。然而,它受到对异常值敏感的困扰;找到可靠的类比,特别是对于高维数据,是至关重要的。我们讨论了高维设置所带来的挑战,在高维环境中,维数与样本数量的阶数相同或更大。实际上,我们不能提供任何已知的高维分式算法的高维边界证明——那么为什么我们不能提供两个非常详细的已知分式算法。我们的第一个算法实现了50%的崩溃点——这是任何算法都可能达到的最佳值,与之前最著名的0%的结果相比有了明显的改进。第二种算法基于凸优化的思想,除了恢复主成分外,还能够识别出被破坏的点。我们将其扩展到部分观察的设置,将矩阵完成结果显著扩展到损坏行或列的设置。
课程简介: Principal Component Analysis is one of the most widely used techniques for dimensionality reduction. Nevertheless, it is plagued by sensitivity to outliers; finding robust analogs, particularly for high-dimensional data, is critical. We discuss the challenges posed by the high dimensional setting, where dimensionality is of the same order, or greater, than the number of samples. We detail why existing techniques fail -- indeed, no known algorithm can provide provable bounds to any constant fraction of outliers -- and then present two very different algorithms for High Dimensional Robust PCA. Our first algorithm achieves a breakdown point of 50% -- the best possible using any algorithm, and a stark improvement from the previous best-known result of 0%. Our second algorithm is based on ideas from convex optimization, and in addition to recovering the principal components, is also able to identify the corrupted points. We extend this to the partially observed setting, significantly extending matrix completion results to the setting of corrupted rows or columns.
关 键 词: 高维数据; 异常值; 算法
课程来源: 视频讲座网
数据采集: 2020-12-07:yxd
最后编审: 2020-12-15:cjy
阅读次数: 38