Graphics processing involves a performance of rapid mathematical calculations for image rendering. Such graphics workloads may be performed at a graphics processing unit (GPU), which is a specialized electronic circuit, to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. The size of the register file (or amount of physical registers) available on current GPU designs has a large impact on both GPU performance and power consumption.
To sustain increasing throughput demand of contemporary graphics workloads, GPUs rely on highly parallel execution of multiple hardware contexts. In such parallel execution, each context has a dedicated register file in order to enable fast context switching. Thus, if the number of registers allocated to a hardware context is too small, a large workload will have spills to main memory, resulting in an undesired performance penalty. Nonetheless, it is not feasible to provide too many on-chip registers due to associated hardware cost and power consumption constraints.