0


随机化还是压缩?通过级联压缩采样绘制 LinearCost 矩阵草图

Randomization or Condensation? Linear­Cost Matrix Sketching Via Cascaded Compression Sampling
课程网址: http://videolectures.net/kdd2017_zhang_linear_cost_matrix/  
主讲教师: Kai Zhang
开课单位: 天普大学
开课时间: 2017-10-09
课程语种: 英语
中文简介:
矩阵草图旨在找到矩阵的紧凑表示,同时保留其大部分属性,这是现代科学计算的基本构建块。随机算法代表了最先进的技术,并引起了机器学习、数据挖掘和理论计算机科学领域的巨大兴趣。然而,它仍然需要使用整个输入矩阵来产生所需的分解,这可能是真正大问题中的主要计算和内存瓶颈。在本文中,我们揭示了矩阵低秩分解和有损信号压缩之间的有趣理论联系,基于该联系,设计了级联压缩采样框架,仅在 O(m+n) 时间内逼近 m×n 矩阵,并且空间。的确,所提出的方法仅访问少量矩阵行和列,这显着改善了内存占用。同时,通过顺序组合两轮近似程序,并将采样策略从统一概率升级为更复杂的、面向编码的采样,实现了显着的算法提升,以发现数据中更细粒度的结构。对广泛的现实世界、大规模矩阵的经验结果表明,通过仅采用线性时间和空间,我们的方法的准确性可以与那些消耗二次 O(mn) 的最先进的随机算法相媲美,资源量。通过顺序组合两轮近似程序并将采样策略从统一概率升级为更复杂的、面向编码的采样,可以实现显着的算法提升,以发现数据中更细粒度的结构。对广泛的现实世界、大规模矩阵的经验结果表明,通过仅采用线性时间和空间,我们的方法的准确性可以与那些消耗二次 O(mn) 的最先进的随机算法相媲美,资源量。通过顺序组合两轮近似程序并将采样策略从统一概率升级为更复杂的、面向编码的采样,可以实现显着的算法提升,以发现数据中更细粒度的结构。对广泛的现实世界、大规模矩阵的经验结果表明,通过仅采用线性时间和空间,我们的方法的准确性可以与那些消耗二次 O(mn) 的最先进的随机算法相媲美,资源量。
课程简介: Matrix sketching is aimed at finding compact representations of a matrix while simultaneously preserving most of its properties, which is a fundamental building block in modern scientific computing. Randomized algorithms represent state-of-the-art and have attracted huge interest from the fields of machine learning, data mining, and theoretic computer science. However, it still requires the use of the entire input matrix in producing desired factorizations, which can be a major computational and memory bottleneck in truly large problems. In this paper, we uncover an interesting theoretic connection between matrix low-rank decomposition and lossy signal compression, based on which a cascaded compression sampling framework is devised to approximate an m-by-n matrix in only O(m+n) time and space. Indeed, the proposed method accesses only a small number of matrix rows and columns, which significantly improves the memory footprint. Meanwhile, by sequentially teaming two rounds of approximation procedures and upgrading the sampling strategy from a uniform probability to more sophisticated, encoding-orientated sampling, significant algorithmic boosting is achieved to uncover more granular structures in the data. Empirical results on a wide spectrum of real-world, large-scale matrices show that by taking only linear time and space, the accuracy of our method rivals those state-of-the-art randomized algorithms consuming a quadratic, O(mn), amount of resources.
关 键 词: 级联压缩采样; 矩阵草图; 数据科学
课程来源: 视频讲座网
数据采集: 2023-12-27:wujk
最后编审: 2023-12-27:wujk
阅读次数: 19