Graphics processing units (GPUs) are designed specifically for performing complex mathematical and geometric calculations, e.g., for graphics rendering. Modern GPUs produce high fidelity images faster than general-purpose central processing units (CPUs). The highly parallel structure of GPUs also makes them more effective than CPUs for algorithms that include processing large blocks of data in parallel. Uses for accelerated graphics, and parallel computing generally, are becoming more important in servers and datacenters. As this trend continues, it will become necessary to protect critical systems from errors caused by GPU failures.
Application programming interfaces (APIs), as a part of the graphics driver architecture, manage pipelined graphics commands and resources received from applications to be rendered by the GPU. In the case of a GPU failure, these APIs notify the application(s) that the application's graphics commands and resources have been lost, e.g. by sending a “DeviceLost” error message. The application(s) are then responsible for recovering from the failure, e.g., by deriving the graphics state at the time of the failure and reissuing the commands and resources to the recovered GPU or another GPU. Leaving the responsibility of recovery to applications can be slow and may lead to inconsistent and often undesirable results, as some applications fail to recover properly.