0


通过二元矩阵分解挖掘离散模式

Mining Discrete Patterns via Binary Matrix Factorization
课程网址: http://videolectures.net/kdd09_ye_mdpbmf/  
主讲教师: Jieping Ye
开课单位: 密歇根大学
开课时间: 2009-09-14
课程语种: 英语
中文简介:
挖掘二进制数据中的离散模式对于子采样、压缩和聚类非常重要。我们考虑排名一的二进制矩阵近似, 以确定数据的主要模式, 同时保留其离散性质。对此类数据的最佳近似值具有一组最小的不一致条目, 即给定二进制数据与近似矩阵之间的不匹配。由于问题的硬度, 以前对这些问题的描述采用了启发式方法, 由此产生的近似值可能与最优近似值相去甚远。本文证明了一阶二进制矩阵近似可以重新表述为0-1 整数线性规划 (ilp)。但是, 即使对于小型矩阵, ilp 公式的计算成本也很高。我们提出了一个线性程序 (lp) 松弛, 证明它实现了保证的逼近误差约束。我们进一步扩展了建议的公式使用正则化技术, 这是常用的解决过度拟合。由于大矩阵涉及大量变量, lp 公式被限制在中等大小的矩阵中。有趣的是, 我们证明了所提出的近似公式可以转化为最小 s-t 切割问题的实例, 可以通过求最大流量来有效地解决。我们的实证研究表明了基于最大流量的算法的有效性。结果也证实了既定的理论界限。
课程简介: Mining discrete patterns in binary data is important for subsampling, compression, and clustering. We consider rank-one binary matrix approximations that identify the dominant patterns of the data, while preserving its discrete property. A best approximation on such data has a minimum set of inconsistent entries, i.e., mismatches between the given binary data and the approximate matrix. Due to the hardness of the problem, previous accounts of such problems employ heuristics and the resulting approximation may be far away from the optimal one. In this paper, we show that the rank-one binary matrix approximation can be reformulated as a 0-1 integer linear program (ILP). However, the ILP formulation is computationally expensive even for small-size matrices. We propose a linear program (LP) relaxation, which is shown to achieve a guaranteed approximation error bound. We further extend the proposed formulations using the regularization technique, which is commonly employed to address overfitting. The LP formulation is restricted to medium-size matrices, due to the large number of variables involved for large matrices. Interestingly, we show that the proposed approximate formulation can be transformed into an instance of the minimum s-t cut problem, which can be solved efficiently by finding maximum flows. Our empirical study shows the efficiency of the proposed algorithm based on the maximum flow. Results also confirm the established theoretical bounds.
关 键 词: 二进制数据的离散模式; 矩阵; 一阶二元矩阵
课程来源: 视频讲座网
最后编审: 2020-06-04:毛岱琦(课程编辑志愿者)
阅读次数: 58