A GPU may be used to rapidly execute code to accelerate the creation of images in a frame buffer for output to a display device such as a smartphone display. The GPU is highly complex due to the large number of computing threads which are required to be executed in parallel in order to meet the performance demands of the display device. A GPU may be a single instruction multiple thread (SIMT) machine that uses an instruction set architecture in which an instruction is concurrently executed on several threads. A GPU using SIMT is designed to limit instruction fetching overhead, i.e., the latency that results from memory access, in combination with “latency hiding” to enable high-performance execution despite considerable latency in memory-access operations. An SIMT machine may include a processing element that executes instruction streams in a non-stallable fashion.
A GPU may use, for example, eight spatial lanes operating over four time cycles to perform 32 threads of processing. A thread generally refers to a point of control that executes instructions. Processing multiple threads may lead to complexity in managing GPU chip area and dynamic power consumption. The power consumed by any GPU feature is multiplied by the number of threads being executed; hence, a small increase in power consumption for a feature results in a large increase in overall GPU power consumption. Similarly, power consumption reduction for a feature is also multiplied by the number of threads executed resulting in a correspondingly large overall power consumption reduction.