首页遗传学
0


拟南芥中常见序列变异的研究

Discovering Common Sequence Variation in Arabidopsis thaliana
课程网址: http://videolectures.net/mlsb07_ratsch_dcs/  
主讲教师: Gunnar Rätsch
开课单位: 马克斯普朗克研究所
开课时间: 2007-10-20
课程语种: 英语
中文简介:
为了表征20株拟南芥模型植物的自然序列变化, 与 perlegen 科学公司合作, 利用高密度寡核苷酸阵列进行了全基因组再测。与现有的基于模型的组合 (mb;hinds 等人, 《科学》, 2005年) 和新颖的机器学习方法。为了识别单核苷酸多态性 (snps), 我们开发了一种基于支持向量机的算法。对已公布的路线进行了培训和评价 (n院校 nordborg 等人, plos 生物, 2005年)。在与 mb 相同的错误发现率 (fdr) 下, ml 算法识别出更真实的 spn, 特别是在多态密度高、杂交质量低的区域。这两种方法的 snp 预测的联合平均包含每株 143 572 snps, fdr 为 2.8% (688 570 非冗余 snps)。此外, 还开发了一种机器学习算法来检测包含插入、删除和变分热点的多态区域, 其中 snp 检测算法通常无法识别单个 snps。它发现了相当多态性的大致位置 (54% 的被删除核苷酸和33% 的插入位点)。结合这三种方法, 74% 的 snp 可以直接调用或包含在多态区域预测中 (zeller 等人, 正在准备中)。我们研究了拟南芥的模式和形成序列变化的力量 (克拉克等人, 科学, 2007年): 例如, 在基因家族之间观察到显著差异, 以及与生物环境相互作用的基因非常特殊多态性水平。
课程简介: In order to characterize natural sequence variation in 20 strains of the model plant Arabidopsis thaliana, whole-genome resequencing with high-density oligonucleotide arrays was performed in collaboration with Perlegen Sciences Inc. Array data were analyzed with a combination of existing model-based (MB; Hinds et al., Science, 2005) and novel machine learning (ML) methods. For the identification of single nucleotide polymorphisms (SNPs) we developed an algorithm based on support vector machines. Training and evaluation was done on published alignments (Nordborg et al., PLoS Biology, 2005). At the same false discovery rates (FDR) as MB, the ML algorithm identifies significantly more true SNPs, especially in regions of high polymorphism density and/or low hybridization quality. The union of SNP predictions from both methods contains on average 143,572 SNPs per strain at a FDR of 2.8% (648,570 non-redundant SNPs). Furthermore, a machine learning algorithm was developed to detect polymorphic regions containing insertions, deletions and variational hotspots, where SNP detection algorithms typically fail to identify individual SNPs. It discovers the approximate location of a substantial additional proportion of polymorphisms (54% of deleted nucleotides and 33% of insertion sites). With a combination of all three methods 74% of SNPs can be directly called or are contained in a polymorphic region prediction (Zeller et al., in preparation). We examined the patterns of and forces shaping sequence variation in Arabidopsis (Clark et al., Science, 2007): e.g. significant differences were observed between gene families, and genes mediating interaction with the biotic environment harbor exceptional polymorphism levels.
关 键 词: 拟南芥; 全基因组测序; 序列变异
课程来源: 视频讲座网
最后编审: 2020-07-30:yumf
阅读次数: 49