Recent biological studies revealed that most complex trait-associated variants are located in non-coding regulatory regions of the genomes, where they have been shown to disrupt transcription factor (TF)-DNA binding motifs. Variable TF-DNA interactions are therefore increasingly considered as key drivers of phenotypic variation. In this project, we will focus on developing effective computational methodology for dissecting and identifying the molecular roles of non-coding genetic variants in complex traits. Firstly, based on recurrent neural network model and adversarial training principle, we will develop a novel methodology for generating artificial sequences, which can serve as high-quality negative controls for downstream analysis. Secondly, based on novel machine learning techniques such as multitask learning and structured scarcity, we will propose computational approaches for predicting motifs of heterodimers in silico. Thirdly, we will propose to predict unobserved high-throughput protein-DNA data by integrating collaborative filtering and recurrent neural network techniques. Fourthly, we will study advanced weakly supervised deep learning models that could exploit multilayered information of different types of genomic data to more precisely predict causal variants and their effects at TF binding. The outcomes of this project could be beneficial for exploration of disease markers and provide further theoretical support for the precise diagnosis and treatment of complex diseases.
近年来的生物学研究表明,大多数与性状存在显著统计相关性的遗传变异均发生于基因组的非编码区域中,这些变异有可能影响转录因子(TF)对DNA序列的识别与绑定功能,因此TF-DNA相互作用所发生的变异已被认为是性状变异的主要原因。在本项目中,我们将聚焦于通过计算方法解析非编码区遗传变异和复杂性状的关系。首先,基于递归神经网络和对抗训练原则,我们将开发新型的DNA序列生成方法,为后续下游分析提供高质量控制样本;其次,基于多任务学习和结构化稀疏等新型机器学习技术,我们将提出直接预测双聚体绑定模体的计算方法;再次,通过整合协同推荐和递归神经网络技术,我们拟研究高通量数据的补全方法;此外,我们还将研究新型深度学习网络架构,并整合多种基因组数据,以便更加精确地预测遗传突变与性状变化之间的定性关系。本项目的研究成果将有助于挖掘疾病靶点,并为复杂疾病的精确诊断和治疗提供依据。
非编码区突变的调控作用研究面临着若干问题,在转录因子绑定模型、转录因子与调控基因关联性以及绑定变异与表型变异关联性分析方面均存在未解决的重大挑战。在本项目中,我们聚焦于通过计算方法解析非编码区遗传变异和复杂性状的关系。首先,提出了基于递归神经网络和对抗训练的新型的DNA序列生成方法,为后续下游分析提供高质量控制样本;其次,提出了基于多任务学习和结构化稀疏等新型机器学习技术以实现预测双聚体绑定模体的任务;再次,通过整合协同推荐和递归神经网络技术以研究高通量数据的补全方法;此外,我们研究了新型深度学习网络架构,并整合多种基因组数据,以便更加精确地预测遗传突变与性状变化之间的定性关系。本课题的研究有助于解析潜在的生物调控机制以及各生命功能层之间的相互作用信息,从而为理解疾病的发病机制,设计相关药物靶点提供理论支持。
{{i.achievement_title}}
数据更新时间:2023-05-31
奥希替尼治疗非小细胞肺癌患者的耐药机制研究进展
长链基因间非编码RNA 00681竞争性结合miR-16促进黑素瘤细胞侵袭和迁移
非牛顿流体剪切稀化特性的分子动力学模拟
现代优化理论与应用
强震过程滑带超间隙水压力效应研究:大光包滑坡启动机制
遗传变异与选择对基因组编码区与非编码区作用模式的比较
猪瘟病毒基因组3“非编码区结构与功能关系的研究
转录因子SOX2与长链非编码RNA在基因组调控中的相互作用机制研究
全基因组编码区遗传变异与口腔鳞癌发病风险及其机制研究