从微观到宏观:纵向电子病历加密数据驱动的表型分析From Micro to Macro: Data Driven Phenotyping by Densification of Longitudinal Electronic Medical Records |
|
课程网址: | http://videolectures.net/kdd2014_zhou_medical_records/ |
主讲教师: | Jiayu Zhou |
开课单位: | 美国三星研究院 |
开课时间: | 2014-10-07 |
课程语种: | 英语 |
中文简介: | 从人口规模的临床数据推断表型模式是个性化医疗开发中的核心计算任务。进行此类研究的一个重要数据来源是患者电子病历 (EMR)。然而,患者 EMR 通常稀疏且嘈杂,如果我们直接使用它们来表示患者表型,这将带来重大挑战。在本文中,我们提出了一个名为 Pacifier (PAtient reCord densIFIER) 的数据驱动表型框架,我们将每个患者的纵向 EMR 数据解释为具有特征维度和时间维度的稀疏矩阵,并通过探索得出更稳健的患者表型这些矩阵的潜在结构。具体来说,我们假设每个派生的表型由原始患者 EMR 中包含的医学特征的子集组成,其值随时间平滑演变。我们提出了两种公式来实现这一目标。一种是个体基础方法 (IBA),它假设每个患者的表型都不同。另一种是共享基础方法 (SBA),它假设患者群体具有一组共同的表型。我们开发了一种高效的优化算法,能够有效地解决这两个问题。最后,我们在两个真实世界的 EMR 队列中验证了安抚奶嘴,用于早期预测充血性心力衰竭 (CHF) 和终末期肾病 (ESRD) 的任务。我们的结果表明,所提出的算法可以显着提高这两个任务的预测性能(在诊断组粒度上,CHF 的平均 AUC 得分从 0.689 提高到 0.816,ESRD 的平均 AUC 得分分别从 0.756 提高到 0.838)。我们还说明了从我们的数据中得出的一些有趣的表型。 |
课程简介: | Inferring phenotypic patterns from population-scale clinical data is a core computational task in the development of personalized medicine. One important source of data on which to conduct this type of research is patient Electronic Medical Records (EMR). However, the patient EMRs are typically sparse and noisy, which creates significant challenges if we use them directly to represent patient phenotypes. In this paper, we propose a data driven phenotyping framework called Pacifier (PAtient reCord densIFIER), where we interpret the longitudinal EMR data of each patient as a sparse matrix with a feature dimension and a time dimension, and derive more robust patient phenotypes by exploring the latent structure of those matrices. Specifically, we assume that each derived phenotype is composed of a subset of the medical features contained in original patient EMR, whose value evolves smoothly over time. We propose two formulations to achieve such goal. One is Individual Basis Approach (IBA), which assumes the phenotypes are different for every patient. The other is Shared Basis Approach (SBA), which assumes the patient population shares a common set of phenotypes. We develop an efficient optimization algorithm that is capable of resolving both problems efficiently. Finally we validate Pacifier on two real world EMR cohorts for the tasks of early prediction of Congestive Heart Failure (CHF) and End Stage Renal Disease (ESRD). Our results show that the predictive performance in both tasks can be improved significantly by the proposed algorithms (average AUC score improved from 0.689 to 0.816 on CHF, and from 0.756 to 0.838 on ESRD respectively, on diagnosis group granularity). We also illustrate some interesting phenotypes derived from our data. |
关 键 词: | 临床数据; 终末期肾病; 患者电子病历 |
课程来源: | 视频讲座网 |
数据采集: | 2021-06-09:zyk |
最后编审: | 2021-06-09:zyk |
阅读次数: | 52 |