0


SUSTain:张量的可扩展无监督评分及其在表型分析中的应用

SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping
课程网址: http://videolectures.net/kdd2018_perros_sustain_unsupervised/  
主讲教师: Ioakim Perros
开课单位: 佐治亚理工学院
开课时间: 2018-11-23
课程语种: 英语
中文简介:
本文提出了一种新的方法,我们称之为SUSTain,它将实值矩阵和张量因子分解扩展到值为整数的数据。当值对应于事件计数或顺序度量时,此类数据很常见。传统的方法是将整数数据视为实数,然后应用实值因子分解。然而,这样做并不能保留原始数据的重要特征,因此很难解释结果。相反,我们的方法从整数数据集中提取因子值作为分数,这些分数被限制为从小整数集中获取值。这些分数很容易解释:分数为零表示没有特征贡献,分数越高表示特征重要性的不同级别。SUSTain的核心依赖于:a)将问题划分为整数约束子问题,以便以有效的方式优化解决这些问题;b)组织子问题解决方案的顺序,以促进共享中间结果的重用。我们提出了两种变体,SUSTainM和SUSTainT,分别处理矩阵和张量输入。我们根据合成和真实电子健康记录(EHR)数据集上的几个最新基线评估SUSTain。与这些基线相比,SUSTain要么表现出明显更好的拟合,要么表现出数量级的加速,达到了相当的拟合(高达425倍)。我们将SUSTain应用于EHR数据集以提取患者表型(即,具有临床意义的患者集群)。此外,87%的患者被心脏病专家确认为与心力衰竭相关的临床意义表型。
课程简介: This paper presents a new method, which we call SUSTain, that extends real-valued matrix and tensor factorizations to data where values are integers. Such data are common when the values correspond to event counts or ordinal measures. The conventional approach is to treat integer data as real, and then apply real-valued factorizations. However, doing so fails to preserve important characteristics of the original data, thereby making it hard to interpret the results. Instead, our approach extracts factor values from integer datasets as scores that are constrained to take values from a small integer set. These scores are easy to interpret: a score of zero indicates no feature contribution and higher scores indicate distinct levels of feature importance. At its core, SUSTain relies on: a) a problem partitioning into integer-constrained subproblems, so that they can be optimally solved in an efficient manner; and b) organizing the order of the subproblems’ solution, to promote reuse of shared intermediate results. We propose two variants, SUSTainM and SUSTainT , to handle both matrix and tensor inputs, respectively. We evaluate SUSTain against several state-of-the-art baselines on both synthetic and real Electronic Health Record (EHR) datasets. Comparing to those baselines, SUSTain shows either significantly better fit or orders of magnitude speedups that achieve a comparable fit (up to 425× faster). We apply SUSTain to EHR datasets to extract patient phenotypes (i.e., clinically meaningful patient clusters). Furthermore, 87% of them were validated as clinically meaningful phenotypes related to heart failure by a cardiologist.
关 键 词: SUSTain; 可扩展无监督评分; 在表型分析中的应用; 张量因子分解扩展; 保留原始数据的重要特征
课程来源: 视频讲座网
数据采集: 2023-03-15:cyh
最后编审: 2023-03-15:cyh
阅读次数: 14