0


统计杠杆和改进的矩阵算法

Statistical Leverage and Improved Matrix Algorithms
课程网址: http://videolectures.net/icml09_mahoney_itslima/  
主讲教师: Michael Mahoney
开课单位: 斯坦福大学
开课时间: 2009-08-26
课程语种: 英语
中文简介:
给定mxn矩阵A和秩参数k,将第i行的杠杆定义为投影矩阵的第i个对角线元素到A的前k个左奇异向量的跨度上。在这种情况下,“高杠杆“行在顶部奇异向量中具有不成比例的大量”质量“。从历史上看,这种统计概念(及其概括)已在诊断回归分析中得到广泛应用。最近,这个概念在开发用于机器学习和数据分析中具有广泛应用的几个基本矩阵问题的改进随机算法中也是核心。将描述使用统计杠杆来改进矩阵算法的最坏情况分析的两个例子。第一个问题是最小二乘近似问题,其中有n个约束和d个变量。可追溯到高斯和勒让德的经典算法使用O(nd2)时间。我们描述了一种随机算法,该算法仅使用O(n d log d)时间来计算相对误差,即1 / epsilon近似值。第二个问题是从m×n矩阵中选择精确k列的“好”集合的问题,并且Gu和Eisenstat的算法提供了先前存在的最佳结果。我们描述了一个改进结果的两阶段算法。还将简要描述最近在现代大规模机器学习和数据分析中应用统计杠杆思想。事实证明,这个概念在大型数据应用中特别富有成效,在这些应用中,出于计算原因而进行关于执行什么计算的建模决策,而不是对数据满足这些计算中隐含的统计假设的任何现实希望。
课程简介: Given an m x n matrix A and a rank parameter k, define the leverage of the i-th row of A to be the i-th diagonal element of the projection matrix onto the span of the top k left singular vectors of A. In this case, "high leverage" rows have a disproportionately large amount of the "mass" in the top singular vectors. Historically, this statistical concept (and generalizations of it) has found extensive applications in diagnostic regression analysis. Recently, this concept has also been central in the development of improved randomized algorithms for several fundamental matrix problems that have broad applications in machine learning and data analysis. Two examples of the use of statistical leverage for improved worst-case analysis of matrix algorithms will be described. The first problem is the least squares approximation problem, in which there are n constraints and d variables. Classical algorithms, dating back to Gauss and Legendre, use O(nd2) time. We describe a randomized algorithm that uses only O(n d log d) time to compute a relative-error, i.e., 1+/-epsilon, approximation. The second problem is the problem of selecting a "good" set of exactly k columns from an m x n matrix, and the algorithm of Gu and Eisenstat provides the best previously existing result. We describe a two-stage algorithm that improves on their result. Recent applications of statistical leverage ideas in modern large-scale machine learning and data analysis will also be briefly described. This concept has proven to be particularly fruitful in large data applications where modeling decisions regarding what computations to perform are made for computational reasons, as opposed to having any realistic hope that the statistical assumptions implicit in those computations are satisfied by the data.
关 键 词: 机器学习; 数据分析; 基本矩阵问题
课程来源: 视频讲座网
最后编审: 2019-04-23:lxf
阅读次数: 89