Protein post-translational modification was closely related to dementia, heart disease, cancer and other diseases. Accurate identification of the post-translational modification sites was very important for understanding the mechanism of diseases and guiding drug design. Experimental approaches for identifying modification sites were time-consuming and costly, with the presentation of massive protein data, development of efficient and reliable calculation method to predict post-translational modification sites become an urgent need. This research intends to conduct thorough research in view of post-translational modification sites’s data selection, feature extraction and model construction. First, we proposed an effective algorithm to label the negative samples and constructed a reliable negative samples set, this algorithm could eliminate the effect of negative samples on the prediction model which selected according to "accept or reject" rule. Then we adopted homology modeling method to construct the spatial structure of the protein which contained post-translational modification site, and further explored novel and significant discriminatory structure, network and regular information based on it. Finally we utilized ensemble learning algorithm to establish species specific post-translational modification sites prediction method, and meanwhile develop an online prediction and analysis platform which can quickly and exactly label multiple kinds of post-translational modification sites and offline large scale prediction software packages. The results of this research could not only provide effective post-translational modification sites computation model and practical software for biologists, but also supply theoretical basis for drug design of related diseases.
蛋白质翻译后修饰与老年痴呆、心脏病、癌症等疾病密切相关,准确地识别翻译后修饰位点对于理解各种疾病的发生机制和指导药物设计具有非常重要的意义。通过实验鉴定这些位点费时费力且代价昂贵,随着蛋白质数据的海量呈现,研发高效可靠的计算方法预测翻译后修饰位点成为迫切需要。本课题拟在翻译后修饰位点的数据选取、特征抽取和模型构建等方面进行深入研究。首先采用有效的算法选取可靠负样本,消除依据"accept or reject"原则选取负样本对预测模型的影响,构建完整的数据集;然后采用同源建模建立翻译后修饰位点所在蛋白质的空间结构,以此为基础探索新颖的、有判别力的重要结构和网络特征;最后引入集成学习算法发展物种特异性的翻译后修饰位点预测方法,同时开发翻译后修饰位点的在线分析预测平台及离线大规模预测软件包。课题完成后,不仅可为生物学家提供有效的计算模型和实用软件,也将为治疗相关疾病的药物设计提供理论上的依据。
本项目在分析现有蛋白质翻译后修饰位点物种特异性预测模型存在不足和瓶颈的基础上,从可靠负样本选取、特征抽取和学习算法三个层面展开系统研究。对于可靠负样本的选取,本项目提出了两种新的基于正例的学习算法,通过与传统的随机选取负样本的方法比较,几种预测模型的精度均有明显的提高。另外,本项目提取并比较分析了多种与翻译后修饰位点相关的特征,例如进化保守性特征、物理化学属性特征、氨基酸网络特征和三级结构相关的特征,并同时研究了不同物种间同一种蛋白质翻译后修饰的特征分布差异,证实了建立物种特异性预测模型的必要性。最后构建了基于新型机器学习算法的物种特异性翻译后修饰位点预测方法,采用算法包括支持向量机、决策树、条件随机场等,从而以进一步提高蛋白质翻译后修饰位点预测方法的精度,开发了能够预测琥珀酰化、谷胱甘肽化、糖基化等几种翻译后修饰位点的预测平台。课题的深入研究不仅可为生物学家提供有效的计算模型和实用软件,同时有利于进一步理解蛋白质翻译后修饰特异性机制,而且还可以对潜在的药物靶标筛选和药物设计提供新思路和新角度。本项目在青年基金的资助下顺利进行并完成了任务,项目负责人以第一作者和通讯作者同发表SCI检索的论文9篇。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
基于分形L系统的水稻根系建模方法研究
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
基于LASSO-SVMR模型城市生活需水量的预测
基于多标签学习的蛋白质翻译后修饰位点预测
基于深度学习方法预测蛋白质翻译后修饰位点
基于机器学习的蛋白质翻译后修饰位点预测的研究
Tau蛋白翻译后修饰位点预测与修饰机制研究