Technical Field
The present invention generally relates to computers and, in particular to, maintaining system reliability in a CPU with co-processors.
Description of the Related Art
Currently, many computer systems employ accelerators (e.g., co-processors such as Graphical Processing Units (GPUs)) to enhance the performing of such system, where programs run on both the CPU and the accelerators. In order to improve system-level reliability, conventional approaches focus on improving the reliability in the CPUs. However, among other deficiencies as readily appreciated by one of ordinary skill in the art, such conventional approaches fail to consider the error that can happen in the accelerator and do not consider how to recover from the error. Thus, there is a need for an improved approach for enhancing the system-level reliability of computer systems that use accelerators.