首页生物学
0


使用机器学习的方法进行有效的蛋白质组学图像分析

Machine learning methods for effective proteomics image analysis
课程网址: http://videolectures.net/licsb2010_manolakos_mlm/  
主讲教师: Elias S. Manolakos
开课单位: 雅典大学
开课时间: 2010-05-03
课程语种: 英语
中文简介:
二维凝胶电泳(2DGE)仍然是最广泛使用的蛋白质鉴定和差异表达分析方法,因为其成本较低,并且存在用于2DGE图像分析的成熟商业软件工具,尽管基于非凝胶的方法是越来越受欢迎。虽然有几个软件包可以保证整个蛋白质斑点检测和量化过程的自动化,但是现在仍然存在这样的困境[1],正如Fey和Larsen在2001年所说的那样,“没有程序在出现复杂情况时是远程自动的2-DE图像“ ......“大多数程序通常需要超过一天的用户动手时间来编辑图像,然后才能将图像完全输入到数据库中” " [2]。为了解决这些局限性并开发自动2DGE图像分析工作流程,我们在之前的工作中开发了一种有效的图像分析方法,该方法首先基于Controurlet变换对2DGE图像进行去噪[3],然后有效地分离去噪图像的部分,包括真实蛋白质斑点(通过使用无边缘的活动轮廓(AC)从背景区域称为感兴趣区域(ROI)[4]。在这项工作中,我们通过添加一个调整良好的操作管道来完成图像分析工作流程。在无监督的机器学习方法中,用于进一步分析每个孤立的ROI,以便在其中“捕获”中心并估计各个“隐藏”点的数量。首先应用ROI像素强度直方图的一维混合建模。识别并移除任何剩余的背景像素。然后将幸存的ROI像素用作“分子发生器”,以便转换(通过随机采样)处理的ROI图像到同构数据集(通过适当的随机采样),表示下面的蛋白质种类的分子的分布(即“投射的”)。作为凝胶图像上的斑点)。这种以机器学习为基础的逆向工程行为构成了这项工作的独特创新,据我们所知,这在2DGE图像分析中尚未应用。然后通过应用分级聚类来定位候选蛋白质点中心。最后,通过使用广义混合建模和最小消息长度(MML)标准将2D高斯模型拟合到数据来描绘各个点边界,以控制最佳模型复杂度。使用真实和合成2DGE图像对这种新颖的斑点建模方法进行了广泛的评估,发现它在斑点检测方面比PDQuest更精确和更具体,而两种方法都具有相当的高灵敏度。此外,即使在存在大量噪声的情况下,它也可以更可靠地估计所提取的斑点的体积,并且在图像的区域中,其中微弱和重叠(或饱和)斑点彼此靠近。应该注意的是,我们为2DGE图像分析开发的端到端工作流程不需要在每次提供新凝胶图像进行分析时对参数进行任何重新校准。这种理想的特性使其成为图像堆栈自动处理的合适候选者,这是高通量蛋白质组学分析所需要的,以支持系统生物学项目。
课程简介: Two-dimensional gel electrophoresis (2DGE) remains the most widely used method for proteins identification and differential expression analysis, due to its lower cost and the existence of mature commercial software tools for 2DGE image analysis, despite the fact that non-gel based methods are gaining in popularity. Although there are several software packages that promise automation of the whole protein spot detection and quantification process, the hard reality remains today [1] that as Fey and Larsen stated in 2001, "There is no program that is remotely automatic when presented with complex 2-DE images" ... "most programs require often more than a day of user hands-on time to edit the image before it can be fully entered into the database‚" [2]. To address these limitations and develop an automated 2DGE image analysis workflow we have developed in previous works an effective image analysis methodology that first denoises the 2DGE image based on the Controurlet transform [3] and then separates effectively the parts of the denoised image which include true protein spots (to be called Regions of Interest (ROIs) from the background-only areas, by using Active Contours (AC) without edges [4]. In this work we complete the image analysis workflow by adding a well tuned pipeline of operations based on unsupervised machine learning methods for analyzing further each isolated ROI, in order to "fish" in it the centers and estimate the quantities of the individual "hidden" spots.One-dimensional mixture modeling of the ROI pixel intensities histogram is applied first to identify and remove any remaining background pixels. Then the surviving ROI pixels are used as "molecules generators", in order to convert (by random sampling) the processed ROI image to an isomorphic dataset (through appropriate random sampling) representing the distribution of molecules of the underlying protein species (that are "projected" as spots on the gel image). This reverse engineering action rooted on machine learning constitutes a unique innovation of this work that, to the best of our knowledge, has not been applied before in 2DGE image analysis. The candidate protein spot centers are then located by applying hierarchical clustering. Finally the individual spot boundaries are delineated by fitting 2D Gaussian models to the data using generalized mixture modeling and the Minimum Message Length (MML) criterion to control the best model complexity. An extensive evaluation of this novel spot modeling methodology using both real and synthetic 2DGE images reveals that it is more precise and more specific than PDQuest in terms of spot detection while both methods achieve comparable high sensitivity. Furthermore, it can estimate more reliably the volumes of the extracted spots, even in the presence of substantial noise and in areas of the image where faint and overlapping (or saturated) spots are located close to each other. It should be noted that the end-to-end workflow that we have developed for 2DGE image analysis does not require any re-calibration of parameters every time a new gel image is presented for analysis. This desirable characteristic makes it a suitable candidate for the automatic processing of image stacks, as needed for highthroughput proteomics analysis to support systems biology projects.
关 键 词: 二维凝胶电泳; 蛋白质的鉴定; 差异表达分析方法
课程来源: 视频讲座网
最后编审: 2021-01-07:chenxin
阅读次数: 39