0


释放大型前瞻性生物库队列用于组学数据分析的潜力:研究设计、预测和因果关系方面

Unlocking the potential of large prospective biobank cohorts for -omics data analysis: aspects of study design, prediction and causality
课程网址: http://videolectures.net/ESHGsymposium2016_fischer_biobank/  
主讲教师: Krista Fischer
开课单位: 塔尔图大学
开课时间: 2016-07-18
课程语种: 英语
中文简介:
近十年来,大量基于人群的生物库队列数据的可用性大幅增加。这些数据集包括各种类型的组学数据(基因组学、转录组学、代谢组学等),以及招募时参与者健康、生活方式和人口统计的广泛数据,通常还包括来自电子健康登记和其他数据库的详细后续数据。本次讲座将讨论基于这些数据集的研究设计和统计分析的各个方面。 首先,将讨论后续数据分析的选择,以评估潜在的基于组学的预测性生物标志物。要考虑的一个重要问题是时间尺度的选择。与传统的生存分析项目不同,招募时间并不代表参与者生命过程中的任何重要事件(如严重疾病的诊断),因此实际随访时间可能不是使用的最佳时间尺度。然而,这也取决于要考虑的生物标记物的类型——它们是取决于参与者当前的健康状况(例如代谢组学数据)还是在出生时确定的(基于DNA的标记物)。我们根据模拟数据和爱沙尼亚生物银行队列阐述了这些概念,以了解在所考虑的每种情况下,什么是最佳分析策略。另一个问题是研究设计——特别是在只有一大组群的子集可以被选择进行基因分型或其他类型的样本处理以获得相关的组学数据的情况下。这里,将讨论嵌套病例对照研究设计的潜力。 要讨论的第二个主题是在个性化风险预测中使用遗传数据。大型生物库队列提供了比较和验证此类预测值的数据。在常见复杂疾病的情况下,必须考虑疾病的多基因性质,因此多标记分数比任何单个SNP具有更好的预测能力。在这里,重要的是要就分数的遗传标记的选择以及用于组合它们的权重达成最佳决策。这一概念将通过爱沙尼亚生物银行数据中的2型糖尿病风险预测示例加以说明。 最后,将讨论因果建模的一些方面。大量队列的可用性鼓励许多研究人员使用孟德尔随机方法来估计不同生活方式和临床参数对结果的因果影响。然而,因果推理技术总是依赖于一些不稳定的假设,而这些假设常常被遗忘。我们讨论,在非遗传变量和两个非遗传变量的情况下,是否有可能区分替代性因果情景(如调解和多效性)。
课程简介: Recent decade has seen a tremendous increase in availability of data from large population-based biobank cohorts. Such datasets include various types of -omics data (genomics, transcriptomics, metabolomics etc) as well as extensive data on participants' health, lifestyle and demographics at recruitment and often also detailed follow-up data from electronic health registries and other databases. This talk will discuss aspects of study design and statistical analysis based on such datasets. First of all, the options of analysis of follow-up data will be discussed, in order to evaluate potential -omics based predictive biomarkers. One important issue to consider is the choice of timescale. Unlike traditional survival analysis projects, the recruitment time does not mark any important event (such as diagnosis of a serious disease) in the participants life course and therefore the actual follow-up time may not be the optimal timescale to use. However, this depends also on types of biomarkers to be considered - whether they depend on current health status of the participant (as metabolomics data, for instance) or are determined at birth (DNA-based markers). We illustrate the concepts based on both simulated data and the Estonian Biobank cohort to understand, what is the optimal analysis strategy in each of the situation considered. Another issue is study design - especially in cases where only a subset of a large cohort can be selected for genotyping or another kind of sample processing to obtain the relevant -omics data. Here, the potential of nested case-control study design will be discussed. The second topic to be discussed is the use of genetic data in personalized risk prediction. Large biobank cohorts provide data to compare and validate such predictors. In case of common complex diseases, the polygenic nature of the disease has to be taken into account and therefore multimarker scores have considerably better predictive ability than any of the single SNPs. Here it is important to reach on optimal decision on the choice of genetic markers to the score as well as the weights used to combine them. The concept will be illustrated using the example of Type 2 Diabetes risk prediction in the Estonian Biobank data. Finally, some aspects of causal modeling will be discussed. Availability of large cohorts has encouraged many researchers to use Mendelian Randomization methodology to estimate causal effects of different lifestyle and clinical parameters on the outcomes. However, causal inference techniques always rely on some untestable assumptions and these are often forgotten. We discuss, whether it is possible to distinguish between alternative causal scenarios (such as mediation and pleiotropy) in case of on genetic and two non-genetic variables.
关 键 词: 生物库队列数据; 因果建模; 非遗传变量
课程来源: 视频讲座网
数据采集: 2021-12-22:zkj
最后编审: 2021-12-22:zkj
阅读次数: 52