0


基于主动学习的半监督图嵌入方案:高维生物医学数据分类

Semi-Supervised Graph Embedding Scheme with Active Learning (SSGEAL): Classifying High Dimensional Biomedical Data
课程网址: http://videolectures.net/prib2010_lee_ssge/  
主讲教师: George Lee
开课单位: 新泽西州立大学
开课时间: 2010-10-14
课程语种: 英语
中文简介:
在本文中,我们提出了一种新的降维(DR)方法(SSGEAL),它将图形嵌入(GE)与半监督和主动学习相结合,以提供低维数据表示,从而实现更好的类别分离。诸如主成分分析和GE的无监督DR方法先前已经应用于在尺寸减小的空间中对高维生物医学数据集(例如DNA微阵列和数字化组织病理学)的分类。但是,这些方法不包含类标签信息,通常导致嵌入在数据类之间存在显着重叠。最近提出了半监督降维(SSDR)方法,其利用标记和未标记的实例来学习最佳低维嵌入。然而,在涉及生物医学数据的若干问题中,获得类别标签可能是困难的和/或昂贵的。 SSGEAL利用来自实例的标签,通过基于支持向量机的主动学习算法识别为“难以分类”,以驱动更新的SSDR方案,同时降低标签成本。来自7个基因表达研究的真实世界生物医学数据和前列腺癌针活组织检查的3900个数字化图像用于显示SSGEAL与GE和SSAGE(最近流行的SSDR方法)相比在剪影指数(SI)方面的优越性能。 (对于GE,SI = 0.35,对于SSAGE,SI = 0.31,对于SSGEAL,SI = 0.50)和随机森林分类器的接收器工作特性曲线(AUC)下的面积(GE的AUC = 0.85,SSAGE的AUC = 0.93, SSGEAL的AUC = 0.94)。
课程简介: In this paper, we present a new dimensionality reduction (DR) method (SSGEAL) which integrates Graph Embedding (GE) with semi-supervised and active learning to provide a low dimensional data representation that allows for better class separation. Unsupervised DR methods such as Principal Component Analysis and GE have previously been applied to the classification of high dimensional biomedical datasets (e.g. DNA microarrays and digitized histopathology) in the reduced dimensional space. However, these methods do not incorporate class label information, often leading to embeddings with significant overlap between the data classes. Semi-supervised dimensionality reduction (SSDR) methods have recently been proposed which utilize both labeled and unlabeled instances for learning the optimal low dimensional embedding. However, in several problems involving biomedical data, obtaining class labels may be difficult and/or expensive. SSGEAL utilizes labels from instances, identified as “hard to classify” by a support vector machine based active learning algorithm, to drive an updated SSDR scheme while reducing labeling cost. Real world biomedical data from 7 gene expression studies and 3900 digitized images of prostate cancer needle biopsies were used to show the superior performance of SSGEAL compared to both GE and SSAGE (a recently popular SSDR method) in terms of both the Silhouette Index (SI) (SI = 0.35 for GE, SI = 0.31 for SSAGE, and SI = 0.50 for SSGEAL) and the Area Under the Receiver Operating Characteristic Curve (AUC) for a Random Forest classifier (AUC = 0.85 for GE, AUC = 0.93 for SSAGE, AUC = 0.94 for SSGEAL).
关 键 词: 降维; 图形嵌入; 低维数据
课程来源: 视频讲座网
最后编审: 2019-09-14:lxf
阅读次数: 51