0


基于基因组数据融合的变异优先排序

Variant prioritization by genomic data fusion
课程网址: http://videolectures.net/mlpmsummerschool2013_moreau_genomic_data...  
主讲教师: Yves Moreau
开课单位: 鲁汶大学
开课时间: 2014-03-13
课程语种: 英语
中文简介:
NGS通过对罕见的外显变异进行测序,迅速提高了我们发现许多以前未解决的罕见单基因疾病病因的能力。然而,在对健康人群或未受影响样本中不存在的非同义单核苷酸变异(NSNV)和功能缺失突变进行标准筛选后,许多潜在的候选突变常常被保留下来,我们需要预测方法来确定变异的优先次序,以便进一步验证。已经有人提出了几种计算方法,这些方法考虑了突变的生化、进化和结构特性,以评估其潜在的危害性。然而,大多数这些方法在预测罕见NSNV的影响时存在较高的假阳性率。对这种不良表现的一个合理的解释是,许多预测的变异都是轻微有害的,但并不是特定于感兴趣的疾病。因此,我们提出了一种整合多种策略的基因组数据融合方法来检测突变的有害性,并以特定表型的方式对其进行优先排序。一个关键的创新是我们在我们的策略中加入了一种基因优先排序的计算方法,该方法通过融合异质基因组信息,根据突变基因与已知疾病基因的相似性对突变基因进行评分。我们还整合了单倍体不足预测分数,预测基因功能在功能单倍体状态下受到影响的概率。为了整合或融合这些数据源,我们利用人类基因组突变数据库(HGMD)开发了一个机器学习模型,该模型使用人类致病突变的人类基因组突变数据库(HGMD)与三个控制集:常见多态性和两个独立的稀有变异集进行比较。对HGMD的基准研究表明,这种整合表型特异性变异的优先级显著优于最先进的预测因子,如SIFT或polyphs-2。
课程简介: NGS has rapidly increased our ability to discover the cause of many previously unresolved rare monogenic disorders by sequencing rare exomic variation. However, after standard filtering against nonsynonymous single nucleotide variants (nSNVs) and loss-of-function mutations that are not present in healthy populations or unaffected samples, many potential candidate mutations are often retained and we need predictive methods to prioritize variants for further validation. Several computational methods have been proposed that take into account biochemical, evolutionary and structural properties of mutations to assess their potential deleteriousness. However, most of these methods suffer from high false positive rates when predicting the impact of rare nSNVs. A plausible explanation for this poor performance is that many of these predicted variants are mildly deleterious, but in no way specific to the disease of interest. We therefore propose a genomic data fusion methodology that integrates multiple strategies to detect deleteriousness of mutations and prioritizes them in a phenotype-specific manner. A key innovation is that we incorporate into our strategy a computational method for gene prioritization, which scores mutated genes based on their similarity to known disease genes by fusing heterogeneous genomic information. We also integrate haploinsufficiency prediction scores that predict the probability that the function of a gene is affected if present in a functionally haploid state. To integrate or fuse these data sources, we develop a machine-learning model using the Human Genome Mutation Database (HGMD) of human disease-causing mutations compared to three control sets: common polymorphisms and two independent sets of rare variation. Benchmarking on HGMD demonstrates that this integrative phenotype-specific variant prioritization significantly outperforms state-of-the-art predictors, such as SIFT or PolyPhen-2.
关 键 词: 外显子变异; 单基因疾病; 基因排序
课程来源: 视频讲座网
数据采集: 2020-12-21:yxd
最后编审: 2020-12-21:yxd
阅读次数: 120