0


用于国际ALS研究的高通量基因、环境和表观遗传学数据库和分析系统

A high throughput gene, environment and epigenetics database and analysis system for international ALS research
课程网址: http://videolectures.net/encals2017_iacoangeli_epigenetics_databa...  
主讲教师: Alfredo Iacoangeli
开课单位: 伦敦国王学院
开课时间: 2017-07-21
课程语种: 英语
中文简介:
基因技术发展迅速。我们现在有能力快速、廉价地收集大量的基因信息,由于研究小组之间的密切合作,我们可以与成千上万的人一起完成这项工作。问题是如何存储、处理和方便地共享这些信息。这个项目是研究运动神经元疾病的研究人员和研究生物信息的计算机科学家之间的合作。我们的目标是开发一个计算机系统,使研究人员能够轻松地使用已经收集到的遗传、临床和生活方式信息,并在产生新信息时添加新的信息。该系统将使人们更容易看到临床特征、生活方式和基因变异之间的关系模式,比较不同人群之间的基因差异,并在研究小组之间共享信息。我们正在实施一个解决方案,它将能够共享大量原始测序数据以及用于汇总结果的小文件。对于原始和已处理的数据,我们使用iRODS,一个集成的面向规则的数据系统,是为构建分布式存储基础设施而开发的。通过数据虚拟化,不同位置的几个iRODS服务器可以通过基于内部规则的自动机制来共享和操作它们的数据。这将有助于我们基因研究产生的大量数据的共享、管理和分析。一个iRODS系统能够承载和处理数PB的遗传数据,已经部署在我们的BRC/King's College伦敦HPC集群Rosalind上。可以通过用户友好的web浏览器和命令行访问数据。结果和汇总统计数据以及临床数据将加载到TranSMART平台。TranSMART系统是一个转化医学平台,包括一个关系数据库后端和一个基于web的接口,集成了大量用于分析和可视化的开源生物信息学工具。该平台将为一般用户提供处理数据的访问,并允许队列选择和动态分析。我们还实施了社区驱动的元数据标准和管道,用于提取和数据分析,这些都可以使用irod实现自动化。github上的所有管道都将与iRODS Docker映像一起提供,以允许ALS研究社区的任何成员快速部署他们自己的iRODS时间刻度。
课程简介: Genetic technology is advancing rapidly. We now have the ability to quickly and cheaply collect huge amounts of genetic information, and because of close collaboration between research groups, we can do this with tens of thousands of people. The problem is how to store, handle and easily share this information. This project is a collaboration between researchers working on motor neuron disease, and computer scientists working with biological information. We aim to develop a computerized system that will let researchers easily use genetic, clinical and lifestyle information that has already been collected, and add new information as it is produced. The system will make it easy to see patterns in the relationship between clinical features, lifestyle, and gene variations, to compare genetic variations between groups of people, and to share the information between research groups. We are implementing a solution that will enable the sharing of huge raw sequencing data as well as small files for summary results. For raw and processed data we used iRODS, an integrated Rule-Oriented Data System, developed to build distributed storage infrastructure. Through data virtualization several iRODS servers in different locations can share and manipulate their data through automatic mechanisms based on internal rules. This would facilitate sharing, curating and the analysis of the huge amount of data that our genetic research is producing. An iRODS system able to host and deal with petabytes of genetic data, has been deployed on Rosalind, our BRC/ King’s College London HPC cluster. Data is accessible both through a user friendly web browser and the command line. Results and summary statistics data, along with the clinical data, will be loaded into the TranSMART platform. The TranSMART system is a platform for translational medicine comprising a relational database back end and a web based interface that integrates a large number of open source bioinformatics tools for analysis and visualization. This platform will provide general user access to processed data and will allow for cohort selection and analyses on the fly. We are also implementing community driven metadata standards and pipelines for their extractions and data analysis which can be automatized using iRODS. All Pipelines will be available on github together with iRODS Docker images to allow any member of the ALS research community to quickly deploy their own iRODS timescale of hours.
关 键 词: 基因; 遗传变异; 临床
课程来源: 视频讲座网
数据采集: 2020-12-14:yxd
最后编审: 2020-12-14:yxd
阅读次数: 24