Hardware for rendering three-dimensional graphics is highly parallel and includes a large number of individual processing units that request data from memory, perform calculations on the data, and provide processed data to a frame buffer for output to a screen. Accessing data in memory typically involves a large amount of latency. Cache systems are provided to reduce that latency. However, because of the large amount of data typically processed in rendering operations, additional improvements in memory access latency are desirable.