Accurately identifying the ligand-binding sites in protein is of significant importance for understanding protein function, analyzing the relationship between biological molecules, and instructing drug design etc. With the continuous accumulation of protein data, massive, multi-source and complementary have become the inherent characteristics of current protein data. It is highly desired to develop intelligent computational methods for predicting the ligand-binding site from protein. In this project, we will systematically carry out research on protein-ligand binding site prediction models and methods based on the thorough evaluation on the characteristics of protein data and bottlenecks of the existing prediction methods. Our major research topics include: 1) we will extract the high-quality feature from the multi-source and complementary protein data; 2) to effectively deal with the massive and ever-increasing characteristic of protein data, we will propose new query-driven prediction model by fusing data cleaning and incremental learning; 3) we will enhance the performance of ligand-specific prediction by employing multi-label learning method; 4) we will enhance the efficacy of deep learning model in the ligand-binding site prediction by introducing the known biological knowledge of proteins and the extra knowledge extracting from the unlabeled protein data; and 5) based on the theoretical research results, we will design online prediction platform for high-throughput identifying protein-ligand binding sites. We expect to research and develop effective machine learning theory and algorithms for meeting the urgent demands of protein data analysis and processing from proteomics. Also, we will provide theoretically-sound, practically-good and highly interpretable protein computational models for biologists and general users. Our research will enrich the contents of protein computation, speed up the process of unveiling the relationship between protein sequence, structure and function, and further accelerate the development of bioinformatics.
精确识别蛋白质中的配体绑定位点,对于理解蛋白质功能、分析生物分子间相互关系、指导药物设计等具有重要指导意义。随着蛋白质数据持续累积,海量、多源、互补已成为当前蛋白质数据的固有特性,研究高效可靠的计算方法来预测蛋白质配体绑定位点成为迫切需要。本项目针对蛋白质数据固有特性,在分析已有预测方法所存在的不足和瓶颈基础上,展开相关研究,以进一步提升配体绑定位点预测性能。具体内容包括:1)多源、互补蛋白质数据的特征抽取与选择;2)基于数据清理与增量式学习的查询驱动方法;3)融合多标签学习的配体特异性预测方法;4)能够充分利用蛋白质先验信息及未标注功能蛋白质数据信息的深度学习方法;5)研发高通量配体绑定位点预测平台。本研究旨在为生物学及相关研究人员提供理论依据充分、精度高、可解释性强的应用系统,进一步丰富配体绑定位点预测领域的理论和方法,亦是对计算生物领域的丰富与发展,有助于进一步推动生物信息学的发展。
精确识别蛋白质中的配体绑定位点,对于理解蛋白质功能、分析生物分子间相互关系、指导药物设计等具有重要指导意义。本项目针对蛋白质数据持续累积,海量、多源、互补的特性,在分析已有预测方法所存在的不足和瓶颈基础上,展开了蛋白质与配体绑定位点预测方法研究。在蛋白质特征提取研究方面提出了:1) 基于加权卷积特征的蛋白质与DNA相互作用预测方法,2)基于立方体格式特征的蛋白质与DNA相互作用位点预测方法以及3)基于深度多视角特征学习框架的蛋白质溶剂可及性预测方法;在配体特异性的模板匹配研究方面提出了:1)基于查询特异性与支持向量机的蛋白质与配体绑定位点预测方法和2)基于模板查询的ATP特异性蛋白质与配体对接方法;在深度学习算法应用方面提出了基于深度卷积神经网络的蛋白质与ATP绑定位点预测方法。本项目的研究成果可以为生物学及相关研究人员提供精度高、可解释性强的配体绑定位点预测方法,亦是对计算生物领域的丰富与发展,有助于进一步推动生物信息学的发展。
{{i.achievement_title}}
数据更新时间:2023-05-31
一种光、电驱动的生物炭/硬脂酸复合相变材料的制备及其性能
内点最大化与冗余点控制的小型无人机遥感图像配准
基于多模态信息特征融合的犯罪预测算法研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
氯盐环境下钢筋混凝土梁的黏结试验研究
基于多源信息融合的蛋白质亚细胞定位预测算法研究
海量多源异构数据的使用授权与鉴权体系研究
海量多源异构数据的使用授权与鉴权体系研究
海量多源异构数据的使用授权与鉴权体系研究