Current parallel graphics data processing includes systems and methods developed to perform specific operations on graphics data such as, for example, linear interpolation, tessellation, rasterization, texture mapping, depth testing, etc. Traditionally, graphics processors used fixed function computational units to process graphics data; however, more recently, portions of graphics processors have been made programmable, enabling such processors to support a wider variety of operations for processing vertex and fragment data.
During the performance of rasterization processing, graphical objects are processed to include a representation of the object within a resultant image in which the object is properly displayed relative to all of the other objects visible within the scene. In many prior systems, each graphical object is completely rasterized or the blocks of pixels are processed in a more random sequence that may be dictated by the processing of the graphical objects in earlier parts of the graphics pipeline. As such, the graphics processor (e.g., graphics processing unit/GPU) would need either multiple buffers 600 to contain the image buffer being scanned out in one image buffer and contain the next image being processed as the graphical objects are rasterized. By scheduling the processing of blocks of image pixel data in the same order that the image pixel data is scanned out, and by scheduling the processing of these blocks such that the rasterization process has completed before the start of scanning of the pixel data, data processing latency is reduced while also utilizing less image buffer memory. The present invention overcomes the inefficiencies of prior system as disclosed herein.