With the rapid development of high performance computing platform and continually expanding of application fields of data-intensive computing, the complexity of computation involved and the volumes of data transferred have been on the rise. The problem of high energy consumption needs to be resolved urgently. Considering the transmission of large data sets will significantly affect the completion time and energy consumption of these tasks, the energy efficient scheduling mechanism for this circumstance must be combined with replica selection and data replication strategies. This project is devoted to the replica-aware energy efficient scheduling optimization mechanism and algorithms for data-intensive computing. Firstly, based on the idea of weighted set covering problem,a replica selection algorithm is proposed taking into consideration both access cost and transmission time. The algorithm can not only reduce the volumes of data transferred, but also provide economic data resources for users. Secondly, an energy consumption model that can reflect the characteristics of tasks is established. In order to make a balance among performance, economic cost and energy consumption, an energy efficient scheduling algorithm is designed under the constraints of the completion time and cost budget of an independent task set. Thirdly, an energy efficient three-phase dynamic data replication strategy is put forward. The algorithm aims at achieving the compromise of system performance, data availability and energy consumption. Finally, a simulator for data-intensive computing is developed. The efficiency and effectiveness of the algorithms mentioned above are evaluated and the results are analyzed in theory. The simulation results can provide the theoretical basis for the selection and application of the algorithms in the real environment.
随着高性能计算平台的飞速发展,数据密集型计算的应用领域不断延伸,计算复杂度和数据传输量不断增加,由此产生的高能耗问题亟待解决。由于大规模数据集的传输将极大影响此类任务的完成时间和消耗能量,适用于该环境的高效能调度机制必须与数据副本选择和复制策略有机结合。本项目研究数据密集型计算中副本感知的高效能调度优化机制与算法。运用带权重集合覆盖思想,提出一种兼顾访问成本和传输时间的副本选择算法,在减少数据传输量的同时,为用户提供经济的数据资源;建立能体现任务特征的能耗模型,以独立任务集的完成时间和成本预算为约束条件,设计高效能调度算法,在性能、经济成本和能耗之间寻求平衡;提出一种面向低能耗的三阶段动态数据复制算法,旨在实现系统性能、数据可用性和能耗的折衷;研制适用于数据密集型计算环境的仿真实验系统平台,测试提出的各类算法的有效性和性能,并从理论上分析仿真结果,为实际环境中算法的选择和应用提供理论依据。
本项目研究了数据密集型计算中副本感知的高效能调度优化机制与算法。课题组首先将市场经济理论引入副本选择的研究中,设计了一种高效的副本选择算法,既满足数据副本集总访问成本最低的需求,又尽量减少大规模数据集的迁移。然后在数据副本选择结果已确定的前提下,通过分析数据密集型计算环境的特点及每个独立任务执行时的能量消耗情况,建立了包括计算资源能耗、网络能耗和数据读写能耗在内的单个任务的能耗模型,基于上述能耗模型和功耗函数,将面向低能耗的任务调度问题描述为一个组合优化问题,提出了一种基于完成时间和成本预算约束的高效能调度算法。其次综合考虑系统性能、数据可用性和复制能耗,提出了一种三阶段动态数据复制与布局策略。该策略旨在保证系统性能和数据可用性的前提下,减少数据复制与布局过程中消耗的能量。为了克服真实环境配置不稳定给算法有效性评测带来的困难,本项目开发了适合于数据密集型计算环境资源管理的模拟器(Data-intensive Computing Resource Management Simulator)DCRMSim。通过调整配置文件,使DCRMSim能模拟典型的数据密集型计算场景。本项目较好完成了申请书中提出的目标,项目整个执行期间发表了相关论文3篇,其中SCI检索论文2篇。整个项目培养了硕士生9人。
{{i.achievement_title}}
数据更新时间:2023-05-31
智能煤矿建设路线与工程实践
城市生活垃圾热值的特征变量选择方法及预测建模
带球冠形脱空缺陷的钢管混凝土构件拉弯试验和承载力计算方法研究
基于PROSAIL模型和多角度遥感数据的森林叶面积指数反演
区块链技术:从数据智能到知识自动化
云计算环境中面向数据密集型任务的能效优化策略研究
新型体系结构下数据密集型计算的运行时优化机制研究
多模式计算大数据系统资源的高效能优化方法研究
长尾延迟优化的在线数据密集型计算运行环境支撑技术研究