Genome-wide association studies have identified more than ten thousands of genetic risk variants associated with complex human phenotypes, including human diseases (e.g., diabetes, psychiatric disorders) and non-disease traits (e.g., height, weight, blood pressure). However, complete chains of causality that links genetic variants to phenotypes remain largely elusive due to the lack of cross-validation from different types of functional data. ..The rise of Big Data in biomedicine offers us unprecedented opportunities to build up such complete chains. In contrast to conventional data in Biomedicine, these datasets characterize the biological processes at different layers, including genome, epigenome, transcriptome, proteome and metabolome. How to integrate these multilayered data becomes an essential step to deepen our understanding of biological basis of complex diseases. ..In this research, we aim at developing statistical and computational methods for prioritizing disease-associated variants via integrative analysis of multilayered data. This research is motivated by the following facts: (1) accumulating evidence suggests that different complex traits/diseases share common genetic bases, which is formally known as “pleiotropy”; and (2) functionally relevant variants have been consistently demonstrated to be enriched among GWAS findings. In our pilot study, preliminary results suggest that we can benefit a lot from joint analysis of two GWAS datasets. To continue this promising research, here we propose a stage-wise research strategy for further development of our methods: (1) joint analysis of multiple GWAS dataset; (2) incorporation of functional annotation data into one GWAS data analysis and (3) joint analysis of multiple GWAS data with incorporation of functional annotation data...The novelty of this research is that a statistically rigorous and computationally efficient methods are developed to integrate multilayered data. This helps make the most efficient use of the vast amounts of valuable data that have been generated to dissect complex disease genetics. In contrast to most existing methods that simply combine multilayered data without considering the biological processes, our proposed methods allow sharing indirect information at different layers. This will greatly facilitate biologically interpretable inference and drive new scientific insights. The statistical and computational skills developed here are also broadly applicable to many other disciplines where diverse, rich, and multilayered data are available to address challenging scientific problems.
全基因组关联分析成功地找出了上万个与人类表型相关的遗传变异,这些表型包括疾病(糖尿病、精神病)和非疾病性状(身高、体重、血压)。然而,由于缺乏不同层面的生物学数据的交叉验证,从遗传变异到表型的因果链条上的许多环节还并不十分清楚。生物医疗大数据为我们刻画了各个层面的生命过程,包括基因组、表观基因组、转录组、蛋白质组和代谢组。如何有效地整合多层面的数据成为打造完整因果链的关键。..本项目致力于统计与计算方法的开发,以服务于多层面的数据集成分析。这些方法的研究基于两方面事实:(1)遗传变异的多效性(即一个变异会影响多种表型);(2)非编码遗传变异的调节功能。由此我们提出三步曲的方法研究:(1)多种疾病的全基因组数据集成;(2) 单个疾病的全基因组数据与生物功能型数据的集成;(3)多种疾病的全基因组数据与功能型数据的集成。我们期望本课题的统计与计算方法研究能够为更多的大数据分析领域提供新的思路。
全基因组关联分析成功地找出了上万个与人类表型相关的遗传变异,这些表型包括疾病(糖尿病、精神病)和非疾病性状(身高、体重)。然而,由于缺乏不同层面的生物学数据的交叉验证,从遗传变异到表型的因果链条上的许多环节还并不十分清楚。生物医疗大数据为我们刻画了各个层面的生命过程,包括基因组、表观基因组、转录组、蛋白质组和代谢组。如何有效地整合多层面的数据成为打造完整因果链的关键。..本项目致力于统计与计算方法的开发,以服务于多层面的数据集成分析。这些方法的研究基于两方面事实:一方面,遗传变异具有多效性,即一个变异可以影响多种表型。另一方面,非编码遗传变异具有调控功能。由此,我们完成了三步曲的方法学研究:第一,多种疾病的全基因组数据集成;第二,单个疾病的全基因组数据与生物功能型数据的集成;第三,多种疾病的全基因组数据与功能型数据的集成。基于本项目中所研发的统计与计算方法,我们已有14篇学术论文的方式,发表在国际一流的杂志上,包括The Annals of Statistics, Bioinformatics, Proceedings of the National Academy of Sciences。另外还有8篇论文处于提交和修改阶段。与此同时,我们还为这些方法,提供了软件支持,由此保证我们的数据分析结果能够完全重现。我们相信,这些研究成果能够为更多的大数据应用领域提供更好的分析工具,并为相关领域提供很好的借鉴意义。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
论大数据环境对情报学发展的影响
监管的非对称性、盈余管理模式选择与证监会执法效率?
一种光、电驱动的生物炭/硬脂酸复合相变材料的制备及其性能
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
生物统计中几类相关数据的统计模型与分析
医疗文本大数据分析中的统计学模型和方法
医疗与健康的数据分析与决策
医疗与健康的数据分析与决策