Recent biological studies revealed that most complex trait-associated variants are located in non-coding regulatory regions of the genomes, where they have been shown to disrupt transcription factor (TF)-DNA binding motifs. Variable TF-DNA interactions are therefore increasingly considered as key drivers of phenotypic variation. In this project, we will focus on developing effective computational methodology for dissecting and identifying the molecular roles of non-coding genetic variants in complex traits. Firstly, based on recurrent neural network model and adversarial training principle, we will develop a novel methodology for generating artificial sequences, which can serve as high-quality negative controls for downstream analysis. Secondly, based on novel machine learning techniques such as multitask learning and structured scarcity, we will propose computational approaches for predicting motifs of heterodimers in silico. Thirdly, we will propose to predict unobserved high-throughput protein-DNA data by integrating collaborative filtering and recurrent neural network techniques. Fourthly, we will study advanced weakly supervised deep learning models that could exploit multilayered information of different types of genomic data to more precisely predict causal variants and their effects at TF binding. The outcomes of this project could be beneficial for exploration of disease markers and provide further theoretical support for the precise diagnosis and treatment of complex diseases.
近年来的生物学研究表明,大多数与性状存在显著统计相关性的遗传变异均发生于基因组的非编码区域中,这些变异有可能影响转录因子(TF)对DNA序列的识别与绑定功能,因此TF-DNA相互作用所发生的变异已被认为是性状变异的主要原因。在本项目中,我们将聚焦于通过计算方法解析非编码区遗传变异和复杂性状的关系。首先,基于递归神经网络和对抗训练原则,我们将开发新型的DNA序列生成方法,为后续下游分析提供高质量控制样本;其次,基于多任务学习和结构化稀疏等新型机器学习技术,我们将提出直接预测双聚体绑定模体的计算方法;再次,通过整合协同推荐和递归神经网络技术,我们拟研究高通量数据的补全方法;此外,我们还将研究新型深度学习网络架构,并整合多种基因组数据,以便更加精确地预测遗传突变与性状变化之间的定性关系。本项目的研究成果将有助于挖掘疾病靶点,并为复杂疾病的精确诊断和治疗提供依据。
非编码区突变的调控作用研究面临着若干问题,在转录因子绑定模型、转录因子与调控基因关联性以及绑定变异与表型变异关联性分析方面均存在未解决的重大挑战。在本项目中,我们聚焦于通过计算方法解析非编码区遗传变异和复杂性状的关系。首先,提出了基于递归神经网络和对抗训练的新型的DNA序列生成方法,为后续下游分析提供高质量控制样本;其次,提出了基于多任务学习和结构化稀疏等新型机器学习技术以实现预测双聚体绑定模体的任务;再次,通过整合协同推荐和递归神经网络技术以研究高通量数据的补全方法;此外,我们研究了新型深度学习网络架构,并整合多种基因组数据,以便更加精确地预测遗传突变与性状变化之间的定性关系。本课题的研究有助于解析潜在的生物调控机制以及各生命功能层之间的相互作用信息,从而为理解疾病的发病机制,设计相关药物靶点提供理论支持。
{{i.achievement_title}}
数据更新时间:2023-05-31
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
低轨卫星通信信道分配策略
内点最大化与冗余点控制的小型无人机遥感图像配准
转录组与代谢联合解析红花槭叶片中青素苷变化机制
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
遗传变异与选择对基因组编码区与非编码区作用模式的比较
猪瘟病毒基因组3“非编码区结构与功能关系的研究
转录因子SOX2与长链非编码RNA在基因组调控中的相互作用机制研究
全基因组编码区遗传变异与口腔鳞癌发病风险及其机制研究