0


图表混合模型

Mixture models on graphs
课程网址: http://videolectures.net/pmnp07_sanguinetti_mmog/  
主讲教师: Guido Sanguinetti
开课单位: 谢菲尔德大学
开课时间: 2007-09-07
课程语种: 英语
中文简介:
分析“组学数据集”的最基本挑战之一是将相关数量(基因转录本,蛋白质水平等)聚类成不同的组。在比较从两种不同条件获得的数据时,最简单的情况之一发生,其中基本任务是评估数量是否被上调,下调或不受管制。传统上使用t统计来解决该任务,或者从概率的角度来看,该任务是混合模型,其中一种混合物代表三种调节状态之一。该方法默认地假设各种测量值从相同的混合物分布中独立地得出。然而,众所周知,生物量(基因,酶等)不是独立的,但它们在不同水平的通常非常复杂的相互作用网络中相关联。因此,使用可用的网络结构(和加权)信息以获得表达状态的更准确的推断是合理的。这也可用于寻找表现出相干行为的合适子网,从而产生可测试的生物预测。在这篇文章中,我们介绍了一种在图表上实现混合模型的概率模型。图结构在潜在类成员资格的一组条件先验分布中编码。该配方自然导致Gibbs取样方法。我们提出了合成和实际数据的初步结果,其中基因表达被建模为高斯和两个指数分布的混合。
课程简介: One of the most fundamental challenges in the analysis of 'omics data sets is clustering the relevant quantities (gene transcripts, protein levels, etc.) into distinct groups. One of the simplest instances occurs when comparing data obtained from two different conditions, where the basic task is to assess whether a quantity is upregulated, downregulated or unregulated. This task has traditionally been addressed using t-statistics or, from a probabilistic point of view, mixture models, with one mixture representing one of the three states of regulation. This approach tacitly assumes the various measurements to be independently drawn from the same mixture distribution. However, it is well known that biological quantities (genes, enzymes, etc.) are not independent, but they are linked in an often very complex network of interactions at various levels. It is therefore reasonable to use available network structure (and weighting) information in order to obtain a more accurate inference of the expression state. This can also be found useful in finding suitable subnetworks that exhibit coherent behaviours, giving rise to testable biological predictions. In this contribution, we introduce a probabilistic model that implements mixture models on a graph. The graph structure is encoded in a set of conditional prior distributions over the latent class memberships. This formulation leads naturally to a Gibbs sampling approach. We present preliminary results on synthetic and real data where gene expression is modelled as a mixture of a Gaussian and two exponential distributions.
关 键 词: 组学数据集; 概率; 混合模型
课程来源: 视频讲座网
最后编审: 2020-01-13:chenxin
阅读次数: 42