0


大噪声矩阵中结构信息的极小极大局部化

Minimax Localization of Structural Information in Large Noisy Matrices
课程网址: http://videolectures.net/nips2011_balakrishnan_matrices/  
主讲教师: Sivaraman Balakrishnan
开课单位: 卡内基梅隆大学
开课时间: 信息不详。欢迎您在右侧留言补充。
课程语种: 英语
中文简介:
我们考虑在具有高度损坏条目的大型数据矩阵中识别相关列和行的稀疏集的问题。从蛋白质和药物、生物物种和基因序列、恶意软件和特征码等二部分变量集合中识别群体的问题通常被称为二聚集或共同聚集。尽管它具有很强的实用性,并且有几种特殊的方法可以用于二聚集,但对这个问题的理论分析基本上是不存在的。我们所考虑的问题也与结构化多假设检验密切相关,结构化多假设检验是一个统计领域,最近经历了一系列的活动。我们做出了以下贡献:i)我们证明了成功恢复双团簇所需的最小信号强度的下限,它是噪声方差、矩阵大小和感兴趣的双团簇的函数。ii)我们证明了基于扫描统计的组合程序达到了这个最佳极限。iii)我们描述了多个计算可处理的双聚集过程所需的信噪比,包括单元式阈值、列/行平均阈值和稀疏奇异向量分解的凸松弛方法。
课程简介: We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures, etc is commonly referred to as biclustering or co-clustering. Despite its great practical relevance, and although several ad-hoc methods are available for biclustering, theoretical analysis of the problem is largely non-existent. The problem we consider is also closely related to structured multiple hypothesis testing, an area of statistics that has recently witnessed a flurry of activity. We make the following contributions: i) We prove lower bounds on the minimum signal strength needed for successful recovery of a bicluster as a function of the noise variance, size of the matrix and bicluster of interest. ii) We show that a combinatorial procedure based on the scan statistic achieves this optimal limit. iii) We characterize the SNR required by several computationally tractable procedures for biclustering including element-wise thresholding, column/row average thresholding and a convex relaxation approach to sparse singular vector decomposition.
关 键 词: 聚类; 计算机科学; 机器学习; 特征选择; 数据矩阵
课程来源: 视频讲座网
最后编审: 2019-11-22:cwx
阅读次数: 17