从序列中发现非诱导模式Discovery of Non-induced Patterns from Sequences |
|
课程网址: | http://videolectures.net/prib2010_wong_dnps/ |
主讲教师: | Andrew K. C. Wong |
开课单位: | 滑铁卢大学 |
开课时间: | 2010-10-14 |
课程语种: | 英语 |
中文简介: | 从序列数据中发现模式对基因组学,蛋白质组学和商业都有重大影响。通常遇到的一个问题是,发现的模式通常包含许多冗余,这些冗余是由其强大的统计上显着的子模式引起的虚假显着模式导致的。提出了统计诱导模式的概念来捕获这些冗余。然后开发一种算法以从大的序列数据集中有效地发现非诱导的重要模式。对于性能评估,进行了两个实验以证明a)使用合成数据的问题的严重性和b)从酿酒酵母(酵母)发现的顶部非诱导的显着模式确实对应于生物学家发现的转录因子结合位点。实验证实了我们的方法在生成相对较小的模式集合中的有效性,这些模式揭示了序列中固有的有趣的未知信息。 |
课程简介: | Discovering patterns from sequence data has significant impact in genomics, proteomics and business. A problem commonly encountered is that the patterns discovered often contain many redundancies resulted from fake significant patterns induced by their strong statistically significant subpatterns. The concept of statistically induced patterns is proposed to capture these redundancies. An algorithm is then developed to efficiently discover non-induced significant patterns from a large sequence dataset. For performance evaluation, two experiments were conducted to demonstrate a) the seriousness of the problem using synthetic data and b) top non-induced significant patterns discovered from Saccharomyces cerevisiae (Yeast) do correspond to the transcription factor binding sites found by the biologists. The experiments confirm the effectiveness of our method in generating a relatively small set of patterns revealing interesting, unknown information inherent in the sequences. |
关 键 词: | 序列数据; 基因组; 蛋白质组学 |
课程来源: | 视频讲座网 |
最后编审: | 2019-09-14:lxf |
阅读次数: | 55 |