0


用于预测蛋白质必需性的序列核

Sequence Kernels for Predicting Protein Essentiality
课程网址: http://videolectures.net/icml08_talwalkar_skp/  
主讲教师: Ameet Talwalkar
开课单位: 加州大学伯克利分校
开课时间: 2008-08-06
课程语种: 英语
中文简介:
鉴定维持生命所需的最小基因集的问题对于理解细胞机制和设计治疗药物至关重要。这项工作描述了几种基于内核的解决方案,用于预测在使用较少训练数据的同时优于现有模型的基本基因。我们的第一个解决方案基于源自Pfam数据库的半手动设计内核,该数据库包含多个Pfam域。然后,我们提出新的和基于通用域的序列核,其捕获与由大组蛋白质序列组成的几个结构域的序列相似性。我们展示了如何通过使用自动机代表和有效地计算这些内核来处理大问题 - 数千个具有单个域的域,有时包含数千个序列。我们报告了大量实验的结果,证明它们在预测蛋白质必需性方面与Pfam内核相比有利,而不需要手动调整。
课程简介: The problem of identifying the minimal gene set required to sustain life is of crucial importance in understanding cellular mechanisms and designing therapeutic drugs. This work describes several kernel-based solutions for predicting essential genes that outperform existing models while using less training data. Our first solution is based on a semi-manually designed kernel derived from the Pfam database, which includes several Pfam domains. We then present novel and general domain-based sequence kernels that capture sequence similarity with respect to several domains made of large sets of protein sequences. We show how to deal with the large size of the problem – several thousands of domains with individual domains sometimes containing thousands of sequences – by representing and efficiently computing these kernels using automata. We report results of extensive experiments demonstrating that they compare favorably with the Pfam kernel in predicting protein essentiality, while requiring no manual tuning.
关 键 词: 基因集; 细胞机制; 蛋白质序列
课程来源: 视频讲座网
最后编审: 2019-04-21:lxf
阅读次数: 54