With the challenges of transient faults and partial permanent faults, critical demands are put forward to the design of high reliable multi-core processors. Failure recovery solutions in software or system level cannot simultaneously guarantee transparency, deterministic and high availability. While hardware-based solutions have more advantages, future multi-core processors will have higher degree of chip integration and better scalability, and this provides the possibilities of chip-level solutions for failure recovery. This research will explore high reliable failure recovery methods and models in chip level for multi-core processors. Chip level methods of hardware-based checkpoint/restart for multi-core processors will be studied to obtain transparency, versatility and high availability for transient faults recovery. Then we will study a new separate memory race recording mechanism to guarantee the deterministic execution of the transient faults recovery process. In order to achieve low-cost, fine-grained partial permanent recovery for multi-core processors, we will study area-constrained evolvable hardware algorithms. By analyzing the execution mode of multi-core processors under various fault conditions, we will study multi-mode models of failure recovery to guarantee an adaptive recovery process for multi-core processors. Our research will provide important theoretical bases and technical supports for the design of future high reliable multi-core processors.
多核处理器芯片所面临的瞬时故障和局部永久故障,对高可靠多核处理器芯片的设计提出了更高的要求。软件层或系统层的故障恢复无法同时保证故障恢复的透明性、确定性、高可用性。基于硬件方式故障恢复有更多的优越性,未来多核处理器芯片将具有更高的集成度和可扩展性,这为实现芯片级的故障恢复提供了可能。本课题拟从芯片级故障恢复出发,为多核处理器提供高可靠的故障恢复方法和模型。研究基于芯片级硬件检查点机制的多核处理器卷回恢复方法,实现瞬时故障恢复的透明性、通用性和高可用性;在此基础上,提出一种新的分离式日志记录机制,保证瞬时故障恢复的确定性;研究区域约束下的硬件演化机制,实现低代价、细粒度的多核处理器局部永久故障恢复;通过分析多种故障下多核处理器的执行模式,研究多模式故障恢复的多核处理器芯片模型,保证多核处理器对故障恢复的自适应性。本项目的研究将为未来高可靠多核处理器芯片的设计提供重要理论基础和技术支撑。
随着集成电路产业的飞速发展,多核处理器芯片电路的规模和复杂度日益增加,为应对多核处理器可能发生的各类硬件故障,研究高可靠多核处理器芯片至关重要。本项目从芯片级多核处理器故障恢复方法出发,根据不同的应用环境需求,提出了一个基于多模式故障恢复的高可靠多核处理器芯片模型,该模型可以支持硬件检查点模式、硬件确定性恢复模式及可重构恢复模式等3种基于硬件的故障恢复模式,用户可以根据不同环境中发生的不同故障类型选择合适的故障恢复模式。为了验证多模式故障恢复的多核处理器模型对故障恢复的效率,建立了基于FPGA多核可重构处理器故障恢复验证平台。该平台通过FPGA片上多核平台以支持多线程程序与仿真器的编译、运行和调试,通过随机故障注入工具模拟实际应用的故障,并采用多模式芯片故障恢复模型进行故障修复。在大量的实验研究基础上,本项目团队共发表了20多篇高水平论文,申请了6项发明专利,实验结果证明本项目的多模式故障恢复多核处理器芯片模型能够有效解决多核处理器的瞬时故障和局部永久故障等硬件故障问题,能够为未来高可靠多核处理器芯片的设计提供重要理论基础和技术支撑。
{{i.achievement_title}}
数据更新时间:2023-05-31
演化经济地理学视角下的产业结构演替与分叉研究评述
基于分形L系统的水稻根系建模方法研究
跨社交网络用户对齐技术综述
拥堵路网交通流均衡分配模型
卫生系统韧性研究概况及其展望
面向多核处理器的任务模块生成与调度映射方法研究
面向多核处理器的硬软件协作Transactional Memory系统结构
多核处理器中面向对象Cache体系结构技术研究
面向共享Cache多核处理器的低功耗关键技术研究