Large scale machine learning system has made great achievements in many fields for its powerful learning ability. However, the computation intensive and I/O intensive features of machine learning algorithms have been the challenge of machine learning to achieve a higher performance. This project will study the parallel computing model for intensive iterative convergent machine learning applications,and construct the performance model, and analyze the universality, scalability and fault tolerance of the model according to the characteristics of parallel machine architecture and computing resource expansion model and change characteristics of machine learning applications. On the model of the universality, scalability and fault tolerance for theoretical analysis, and then make a research of performance optimization guidance theory for the model of the data layer, computing layer and communication layer; On this basis,we study the loose type parallel scheduling strategy and set up the communication management model in the light of the characteristics of the intensive iterative convergent machine learning algorithm, the communication management model is studied through continuous communication strategy, priority setting strategy and parameter time sharing broadcasting policy; we build a machine learning algorithm runtime overhead analysis model to break through the traditional optimization methods to select and optimize the timing constraints. Besides, thanks for the model selection and resource mix strategy, after a research of resource dynamic management method, efficiently improve the resource utilization and the parallel system performance. The research results of this project can improve the performance of large scale machine learning, which is of great significance to the intelligent analysis of the big data application.
大规模机器学习系统以其强大的学习能力在诸多应用领域取得了惊人成果,但机器学习算法的计算密集型和I/O密集型特点一直是机器学习向更高性能发展面临的挑战。本项目将针对并行机体系结构、计算资源扩展模式和机器学习应用特征的变化特点,研究密集迭代收敛型机器学习应用的并行计算模型,并构造性能模型,对模型的普适性、扩展性与容错性进行理论分析,进而针对模型的数据层、计算层和通信层,研究性能优化指导理论;在此基础上,围绕密集迭代收敛型机器学习算法特性,研究松散型并行调度策略,建立多迭代同步管理模型,通过连续通信策略、优先级设定策略、参数分时广播策略研究通信管理模型;突破传统的优化方法选择和优化时机限制,构建机器学习算法的运行时开销分析模型,研究资源动态管理方法,通过模型选择和资源组合策略,有效提高资源利用率以及系统并行性能。本项目研究成果可以很好地提高大规模机器学习的性能,对于大数据智能分析具有重要意义。
大规模机器学习系统以其强大的学习能力在诸多应用领域取得了惊人成果,但机器学习算法的计算密集型和I/O密集型特点一直是机器学习向更高性能发展面临的挑战。本项目针对并行机体系结构、计算资源扩展模式和机器学习应用特征的变化特点,研究密集迭代收敛型机器学习应用的并行计算模型,并构造性能模型,对模型的普适性、扩展性与容错性进行理论分析,进而针对模型的数据层、计算层和通信层,研究性能优化指导理论;在此基础上,围绕密集迭代收敛型机器学习算法特性,研究松散型并行调度策略,建立多迭代同步管理模型,通过连续通信策略、优先级设定策略、参数分时广播策略研究通信管理模型;突破传统的优化方法选择和优化时机限制,构建机器学习算法的运行时开销分析模型,研究资源动态管理方法,通过模型选择和资源组合策略,有效提高利用率以及系统并行性能。本项目研究成果很好地提高了大规模机器学习的性能,对于大数据智能分析具有重要意义。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于分形L系统的水稻根系建模方法研究
涡度相关技术及其在陆地生态系统通量研究中的应用
粗颗粒土的静止土压力系数非线性分析与计算方法
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
特斯拉涡轮机运行性能研究综述
基于机器学习的高性能自适应信息抽取关键技术研究
大规模机器学习问题的结构优化方法研究
面向大规模机器学习的高效优化算法研究
基于机器学习的精密检测若干关键技术研究