A graphics processing unit computing system includes a combination of a Graphics Processing Unit (GPU) and a Central Processing Unit (CPU). As such, efficiently operating a graphics pipeline includes distributing a workload appropriately between the CPU and the GPU according to the differences between the two architectures.
One difference between a GPU and a CPU is in how the two architectures address the issue of accessing off-chip memory. This can be an expensive operation and include hundreds of clock cycles of latency. In CPUs, many transistors are allocated to on-chip caches in order to reduce the overall latency caused by off-chip memory accesses.
In contrast, GPUs devote more transistors to arithmetic units than to memory cache, thereby achieving much higher arithmetic throughput than CPUs. However, this GPU architecture results in higher latency for off-chip memory access compared to CPUs.