A multi-media processor may include a central processing unit (CPU), a graphics processing unit (GPU), a video processing unit, a still-image processing unit, and an audio processing unit. For example, a GPU is a dedicated graphics rendering device utilized to manipulate and display computerized graphics on a display. GPUs are built with a highly parallel structure that provides more efficient processing than typical, general-purpose CPUs for a range of complex graphic-related algorithms. For example, the complex algorithms may correspond to representations of three-dimensional computerized graphics. A GPU may implement a number of so-called “primitive” graphics operations, such as operations that form points, lines, and triangles, to create complex, three-dimensional images on a display more quickly than drawing the images directly to the display with a CPU. GPUs may be used in a wide variety of applications, and are very common in graphic-intensive applications, such as video gaming.
GPUs typically include a number of pipeline stages such as one or more shader stages, setup stages, rasterizer stages, and interpolation stages. At each of these stages, a GPU may utilize data stored in an external, i.e., off-chip, memory. For example, after primitive shapes formed within the image data are converted into pixels by a rasterizer stage, pixel rejection may be performed to eliminate those portions of the primitive shapes that are hidden in an image frame. The pixel rejection may be performed based on a pixel depth value comparison between a depth value that has been interpolated for a given pixel and a recorded depth value corresponding to that pixel that is fetched from either an external memory or an embedded memory. If the comparison function returns true, which means that the interpolated depth value is less than (i.e., in front of) the recorded depth value, then the interpolated depth value of the pixel should be written to the memory. If the function returns false, which means that the interpolated depth value is greater than (i.e., behind) the recorded depth value, the recorded depth value of the pixel should remain unchanged and processing of that pixel is halted immediately.
At processing stages that perform read-modify-write functions using data stored in either an external memory or an embedded memory, a multi-media processor repeatedly accesses the memory to perform a data value read from the memory, followed by an updated data value write to the memory. Frequently performing memory accesses causes the multi-media processor to consume large amounts of power and access bandwidth. In order to reduce consumption levels, a multi-media processor may include at least one cache that stores copies of data at frequently used memory locations in a memory. In this way, the multi-media processor may reduce the number of memory accesses by performing data value write and read operations to and from the local cache.