In conventional graphics processing systems, an object to be displayed is typically represented as a set of one or more graphics primitives. Examples of graphics primitives include one-dimensional graphics primitives, such as lines, and two-dimensional graphics primitives, such as polygons. Portions of an object to be displayed are frequently moved from one display location to another. This copying of a source pixel area to a destination pixel area is referred to as a blit operation. A GPU may respond to an instruction to perform a blit operation by performing a read operation to read data in memory locations corresponding to the source pixel area, followed by a write operation to write the data to memory locations corresponding to the destination pixel area. The instruction for a blit operation may specify coordinates to identify the source pixel area, as well as coordinates to identify the location of the destination pixel area.
Within a single blit, if the destination pixel area overlaps the source pixel area, the reads and writes need to be performed with attention to ordering so that reads for pixels that are both in the source and destination pixel area are performed before the destination writes. This is the traditional blit correctness problem. There are known techniques for solving this problem in serial processing systems. It would be desirable to solve this problem in a parallel processing system.
Performance demands are resulting in increased parallel processing in GPUs. Parallel processing raises particular challenges for blit operations. Efficient parallel processing requires out of order execution of operations whenever possible. However, out of order execution of blit operations may result in the reading of stale data and the overwriting of valid data.
It would be desirable to extend the performance benefits of parallel processing to blit operations. However, any such parallel processing of blit operations must preserve data integrity. That is, any such parallel processing of blit operations must be accomplished without incurring errors in the sequencing of read and write operations.