How to make full use of important genetic information carried by dense single-nucleotide polymorphisms (SNP) in large pedigrees, build the appropriate formal models, study and analyze the relationships among these data of SNPs, genes and phenotypes, identify the pathogenic genes and pathogenesis of complex human diseases, is a hot and difficult problem in the fields of biological medicine and genetics, and an important topic in the post-genomic era. In this project, for large pedigree genotype data with missing dense SNPs, an accurate method is proposed for inferring the missing alleles based on the allele list pairs and an approximation method for assigning the genotypes based on conditional probability, which will provide more accurate data information for genetic linkage and association analysis. Using haplotype configurations with top highest likelihoods and their corresponding joint conditional probabilities, a weighted estimation approach is built for IBD(identity-by descent)matrix computation between any two members in pedigree at any positions of chromosome based on the linkage analysis for multiple marker loci of SNP data of large pedigree, which will provide an important basis for the study of genetic modes of complex diseases. To verify accuracy of the IBD matrix calculation, two methods of trait loci mapping are put forward of two-step variance component and extreme brother pairs for complex diseases, which will provide a novel thinking and method for gene mapping of complex disease in human genetics research. Based on the sequencing data of the simulation and real large pedigrees, the effectiveness and accuracy of the three stage methods are verified.
如何利用大家族中核苷酸多态性(SNP)携带的重要遗传学信息,构建合适的形式化模型,研究分析SNP、基因和表型数据间关系并从中识别人类复杂疾病的致病基因及其发病机理,是生物医学、遗传学等领域研究的热点和难点,也是后基因组时代的重要课题。本项目针对大家族密集SNP缺失数据,提出基于等位基因列表对的准确推断方法和基于条件概率的基因型近似指派方法,为遗传学中连锁和关联分析研究提供更准确的数据信息;分析大家族SNP数据多标记位点连锁关系,利用出现可能性较高的单体型格局及其联合条件概率,提出家族成员在染色体任意位置上IBD矩阵加权估计方法,为复杂疾病的遗传方式研究提供重要依据;为验证上述IBD矩阵计算的准确性,提出两阶段方差分量和极值兄弟对两种复杂疾病性状位点定位方法,为人类遗传学研究中复杂疾病相关基因定位提供新思路和新方法;基于仿真和实际大家族生物序列数据验证上述三阶段法的有效性和准确性。
本项目对大家族生物序列数据中等位基因推断、基因型近似指派、单体型格局生成、IBD(identity-by descent)矩阵计算以及多位点定位等理论及其分析方法和应用进行了研究。在理论方面,借鉴遗传学规律的思想,提出了等位基因推断方法在稀疏SNP(single-nucleotide polymorphism)数据和密集SNP数据上的信息提取和相关基因定位;讨论了基于标记位点连锁不平衡信息与条件概率的基因型近似指派方法;探索了采用有界深度优先搜索、分支定界和参数控制约束的单体型格局生成的理论和方法;建立了格局加权的IBD矩阵计算方法,将期望融入最佳线性无偏预测方法和EM算法,进而优化了单体型频率的估计;推广了复杂网络可靠的信息统计量:高阶度和混合度等,通过分析和研究IBD矩阵中的这些统计量来为复杂疾病多性状位点定位和基因定位提供新思路和新方法。在应用方面,探究了全基因组关联研究中海量数据的处理和建模,筛选了不需区分常见和罕见变异的标签SNP集,全面整合了DNA甲基化数据的均值-方差综合信号,融入了甲基化数据的异质性及强相关性的生物特征,探讨了数据驱动的差异甲基化区域识别,推动了基因和表型数据间关系研究。研究分析SNP、基因和表型数据间关系,识别人类复杂疾病的致病基因及其发病机理,为遗传学中连锁和分析研究提供更准确的数据信息,为复杂疾病的遗传方式研究提供重要依据。构建染色体任意位置上的多位点连锁IBD计算和性状位点定位方法,为人类遗传学研究中复杂疾病相关基因定位提供新思路和新方法。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
涡度相关技术及其在陆地生态系统通量研究中的应用
一种光、电驱动的生物炭/硬脂酸复合相变材料的制备及其性能
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
定量性状位点连锁分析定位家族性IgA肾病糖基化异常IgA1基因
同源四倍体物种多位点连锁分析的理论和方法学研究
转录因子结合位点序列基元挖掘的计算方法研究
大麦产量性状杂种优势位点的定位研究