Graphics Processing Units (GPUs) have been emerging as a power-efficient computing platform for executing General Purpose Computing (GPGPU) as well as 3D Graphics. Although the reliability needs of 3D Graphics are not currently critical, GPGPU software requires similar fault-tolerance capabilities to Central Processing Units (CPUs) including robust fault detection to prevent Silent Data Corruption (SDC) in GPGPU code. GPUs provide a massively parallel machine which employs large static random-access memory arrays. Traditional fault detection mechanisms such as ECC require a non-negligible area overhead in such systems. A low cost fault detection mechanism is required to reduce the area cost.
As the operating voltage of the GPU continues to drop and near-threshold operation becomes a design choice to control the power envelope, protection of both on-chip memories and logic in the GPU is needed. Fault protection of the GPU is especially critical in market segments such as server, cloud, real-time embedded, and the like, where the GPU plays an increasingly larger role in the context of GPGPU applications.