1. Field
The present disclosure is generally directed to computing systems. More particularly, the present disclosure is directed to redundant multithreading within a computing system.
2. Background Art
The desire to use a graphics processing unit (GPU) for general computation has become much more pronounced recently due to the GPU's exemplary performance per unit power and/or cost. The computational capabilities for GPUs, generally, have grown at a rate exceeding that of the corresponding central processing unit (CPU) platforms. This growth, coupled with the explosion of the mobile computing market and its necessary supporting server/enterprise systems, has been used to provide a specified quality of desired user experience. Consequently, the combined use of CPUs and GPUs for executing workloads with data parallel content is becoming a volume technology.
However, GPUs have traditionally operated in a constrained programming environment, available only for the acceleration of graphics. These constraints arose from the fact that GPUs did not have as rich a programming ecosystem as CPUs. Their use, therefore, has been mostly limited to two dimensional (2-D) and three dimensional (3-D) graphics and a few leading edge multimedia applications, which are already accustomed to dealing with graphics and video application programming interfaces (APIs).
With the advent of multi-vendor supported OpenCL® and DirectCompute®, standard APIs and supporting tools, GPU use has been extended beyond traditional graphics. Although OpenCL and DirectCompute are a promising start, there are many hurdles remaining to creating an environment and ecosystem that allows the combination of the CPU and GPU to be used as fluidly as the CPU for most programming tasks.
One hurdle remaining is to ensure high reliability when performing general purpose computations on a GPU. For example, use of a GPU in High Performance Computing (HPC) systems requires that the hardware be sufficiently reliable to tolerate faults without causing application errors or system crashes. Thus, in order to ensure high reliability, a mechanism is needed to perform redundant computation on the GPU.
Redundant multithreading (RMT) is one approach to improving reliability in high performance GPUs. RMT techniques must provide: (1) a method to ensure that redundant threads see the same load values from the memory subsystem; (2) a method to compare the outputs of redundant threads for correctness; and (3) a method to coalesce the outputs of redundant threads such that only one thread actually updates the memory subsystem.
Traditional RMT techniques require either comparing every instruction of a thread with its redundant copy or comparing every store of a thread with its redundant copy. These traditional approaches require significant changes to the GPU hardware architecture. Specifically, comparing every instruction or store in parallel requires a significant amount of additional hardware. In addition, these comparisons require the redundant threads to be synchronized. This is because the GPU hardware has limited resources to store one thread's instruction results while waiting for the redundant thread's identical instruction to complete execution. The resulting increase in hardware design complexity of the GPU means an increase in the cost and power requirements of the GPU. In addition, the instruction and store comparison incur a significant performance impact.