Identification and analyses of cis-regulatory motifs represent an important problem in computational biology, and it provides a key piece of information for inference of transcriptional regulatory networks encoded in a cell. Here we propose a comprehensive study on accurate predication and systematic analysis of cis-regulatory motifs in human genome, based on large-scale ChIP-seq date sets in public domain. First, we will design a novel motif prediction method on the TF binding regions provided by a given ChIP-seq data set, aiming to improve both the state-of-the-art accuracy and efficiency. This method organically integrates hash table technique, graph theory and combinatorial optimization, etc. Specifically, it transforms the challenging motif length identification problem into a maximal weighted path problem on a de bruijn graph model; and it enables co-factor and discriminative motif finding, leading to a new co-factor motif module prediction function. Second, the read depth information of binding activity will be used to reduce the bias caused by the random proportion of motif segments in existing motif representing model, the purpose of which is to improve the performance of motif searching and comparing algorithms. Third, we will develop a new framework for transcriptional regulatory network construction and analysis, based on the knowledge and insights gained from the last two studies and large-scale TF binding data in public domain. Finally, an integrated software system for all the above studies will be developed and implemented on a web-based server, aiming to facilitate more researchers with limited computational background. We believe that the proposed studies will intrinsically improve the performance of ChIP-seq based motif finding and enhance the analysis and application of cis-regulatory motifs. The new insights gained and new computational technology developed in this project, will enable a large community of biology researchers to conduct a broad range of data analysis studies that are currently not feasible.
顺式调控模体的预测和分析是计算生物学中的重要问题,是研究生命体内调控机制的关键。本项目利用ChIP-seq技术提供的转录因子结合区域来准确预测和系统分析人类基因组中的顺式调控模体。项目针对人类ChIP-seq数据规模较大带来的计算困难,结合哈希表、图论和组合优化技术设计模体预测算法,兼顾了效率和准确度的提高;同时,引入德布鲁因图技术解决模体长度确定难题,并将共因子模体、区别性模体预测等重要应用功能有机的融合在算法中实现。项目利用测序短序列覆盖深度信息,改进模体表示模型,进而提高模体搜索和比对等模体分析算法的精度。项目基于大规模转录因子数据,通过模体预测与分析,探索转录调控网络构建方法并利用图模型进行调控网络模块分析,最终形成一套顺式调控模体的高效预测和系统分析的软件系统,并实现网络在线服务。本项目的完成将大幅提高调控模体的预测效果,实现调控模体的深层次分析,促进转录调控机制研究。
顺式调控模体的预测和分析是计算生物学中的重要问题,是研究生命体内调控机制的关键。该项目充分利用新一代测序数据研究人类基因组中模体预测的新算法和分析应用新方法,通过引入新的理论、技术和计算模型解决计算瓶颈和存在的问题,提高模体预测精度和效率,开发相应的软件并提供在线网络服务;进而,在此基础上综合大规模ChIP-seq以及表达数据进行调控分析,增加了相关数据的可解释性,促进转录调控领域的研究发展。项目同时研究了人类基因组以及与人类复杂疾病具有复杂关联关系的微生物组,包括在微生物的基因组结构,转录调控单元等方面。主要的成果包括设计了DESSO、CEMIG、WTSA、seqATU等系列模体预测、转录单元预测算法、lncRNA-gene调控网络预测算法,对基于ChIP-seq进行模体预测算法DESSO开发了网络服务器,发表SCI收录论文8篇,包括生物信息学顶级和主流期刊Nucleic Acids Research(IF16.971)、Briefings in Bioinformatics(IF11.622)、Bioinformatics(IF 6.937)等,毕业博士硕士研究生4名。本项目的完成将大幅提高调控模体的预测效果,实现调控模体的深层次分析,促进转录调控机制研究。
{{i.achievement_title}}
数据更新时间:2023-05-31
肥胖型少弱精子症的发病机制及中医调体防治
外泌体在胃癌转移中作用机制的研究进展
基于铁路客流分配的旅客列车开行方案调整方法
一种基于多层设计空间缩减策略的近似高维优化方法
基于MCPF算法的列车组合定位应用研究
真菌顺式调控模体与模块的全基因组范围计算预测
基于新一代测序数据的肿瘤纯度及倍体动态预测方法研究
基于ChIP-seq数据和系统发生信息的调控模体预测
基于新一代肿瘤测序数据的驱动通路发现与综合分析方法研究