In China, lung cancer is the leading cause of death among all types of cancers. In recent years, the genomic studies including Genome-wide association studies and Next Generation Sequencing (NGS) techniques have identified dozens of lung cancer susceptibility regions. In addition, the other ‘-omics’ study including transcriptomics, proteomics, et al, and the integrative studies are continuingly unveiling more mechanisms of lung cancer pathophysiology. As a well-known complex disease, lung cancer involves multiple genes, important environmental factors, different cell processing stages, and complex interactions, which indicate the different ‘-omics’ data be naturally partially overlap, while mainly complementary to each other. How to take the advantage of informative biological knowledge to integrate multiple ‘-omics’ data for further improving the power to identify genes associated with lung cancer? This is the main scientific question to be explored in this proposal. Several integrative analysis methods have developed in recent years which showed common limitations, including: (1) requiring each sample has multiple-platform ‘-omics’ test; (2) ignoring the natural biological information across ‘-omics’ data; or (3) only biologically annotate or visualize the results without statistics to describe the risk of false discovery. Scientists generally believe that the real genomic causal factors for lung cancer should manifest in biological network rather than single genes. Any change of this biological network, like single gene expression change or loss of constraints between genes, would probably increase the risk of lung cancer or poor prognosis. Using the informative prior knowledge of biological regulatory network, we will propose an integrative analysis method using Bayesian network propagation on the network to integrate multiple ‘-omics’ data and improve the efficiency to identify novel genes or network for lung cancer. This study would overcome the limitations mentioned previously, have impact on statistical methodology for ‘-omics’ integrative analysis, and provide biological insight for lung cancer pathophysiology.
在我国恶性肿瘤的病死率中,肺癌位居首位。近年来基因组学、转录组学、蛋白组学等多“组学”正进一步揭示肺癌发生机制。肺癌作为众所周知的复杂疾病,涉及到多个基因、多类环境因素,及复杂的相互作用,使得“组学”间既存在重叠,又充分互补。能否充分利用生物学信息,整合多“组学”数据,进一步提高肺癌关联基因识别效能?是本课题面对的主要问题。现有整合方法主要存3类局限:(1) 要求每例样本同时具有多平台试验数据;(2)未充分利用生物学信息;或(3)仅从生物学角度进行功能注释而缺乏统计学证据。科学家普遍认为,真正的肺癌遗传致病因素可能是多基因构成的网络,这个“功能网络”上的任意一点发生改变,皆可能导致肺癌的发生。本研究拟提出基于生物学调控网络整合相对“独立”的多“组学”数据,充分利用生物学信息,基于贝叶斯理论进行网络传播以更高效的方式识别新的关联基因或网络,能够有效克服上述局限,具有重要的科学意义和实用价值。
本研究以多组学数据的整合分析为背景,从生物学网络(biological network)分析出发,以网络节点单元(基因)及边(上下游调控关系)为切入点,同时考虑外部环境因素,建立中介模型,并进行系统评价。通过统计模拟试验,深入评价了五种中介模型,综合考虑统计学性质及分析效率,推荐VanderWeele法。系统研究了不同研究设计(横断面设计、病例对照设计、队列设计)、不同数据类型(连续性、二分类、生存数据)情况下的中介分析因果推断模型,为后续的功能研究提供更为明确的科学依据。该分析思路为多组学整合分析,以及环境暴露-组学数据的整合分析提供了新的思路,具有一定的创新性。..同时,统计方法学研究结果系统地应用于急性呼吸窘迫综合征(ARDS)这一高死亡率疾病的风险及预后研究。成功发现了LRRC16A为ARDS风险及预后基因。因果推断发现血小板在遗传因素—ARDS结局之间起到了重要的介导作用,为ARDS治疗靶点研究提供了重要的参考依据。系列研究成果发表于急诊与呼吸医学顶级期刊AJRCCM及Chest,并同期配以主编评论。..项目期间共计发表研究论文10篇(其中SCI论文6篇,医学1区论文3篇),超额完成既定任务,研究成果获得国际同行的广泛好评;联合申请并获得软件著作权2项,为生物医学大数据的整合分析提供便利工具;相关研究成果作为国家重点研发计划·战略性国际科技合作重点专项的部分申报基础,成功获得立项(2016YFE0204900,本人为研究骨干)。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
论大数据环境对情报学发展的影响
跨社交网络用户对齐技术综述
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
基于化学计量学方法的生物组学数据整合分析
基于多种生物学数据整合的基因调控网络重构研究
基于整合海量多“组学”数据的方法研究基因转录和剪接的协同作用
整合先验信息与组学数据重构转录和转录后调控网络的生物信息学方法研究