Graphical images are often generated in several steps. For example an image may be created and then read to create another image. Referring to FIG. 1, consider an application operating on a graphics processing unit (GPU) 100 that includes a render target (RT) step. The GPU 100 includes an on-chip cache hierarchy 105, where the cache hierarchy 105 may include different cache levels, such as L1 and L2 cache memory. Additionally the GPU 100 can access an external memory 110. The RT is an intermediate memory surface to which a 3D image is rendered. A Render Target Texture or RTT is an RT that can be read as input texture by a pixel shader in the graphics processor 100.
In the case of rendering textures, a sequence of steps may be performed to create RT “A” and then read RT A to create RT “B”. For example one possibility is to create an image and then add motion blur. Another possibility is to create a G-buffer (lighting parameters) and then create the lit image.
However, these render target steps conventionally require access to external memory 110. Consider a graphics application that produces an intermediate image A, and then reads image A to produce image B. Given common image sizes (e.g., 1920×1080 pixels), and assuming the pixel is 4 bytes (RGBA8888 format), the intermediate image would have to be written to external memory because cache sizes on conventional GPUs are not big enough to hold 8 MB of data. Thus image A, even if broken down into smaller tiles (e.g., 64 pixel×64 pixel tiles, where a tile corresponds to rectangular region of a screen), would have to be written to external memory 110 and then read back by the GPU 100 to produce image B.
Thus in the prior art a graphics processor would normally render all of a first RT (e.g., RT A), write it to external memory 110, and then read it from external memory 110 to create a second RT (RT B). This approach has the drawback that it creates a lot of traffic to and from the external memory 110. Additionally, this process can also include rendering of unnecessary portions of intermediate images.
Embodiments of the present invention were developed in view of the deficiencies in the prior art.