Graphics operations typically require a read, modify, write operation. For example, in source/destination blending, a translucent surface is rendered by reading the solid surface color from the frame buffer memory and then arithmetically blending with the translucent color. The blended result is then written back to the same location in the frame buffer. In plane masking, a bit mask is used such that only specified bits of a pixel are overwritten by new data; in general this requires reading the destination, merging new data and writing the result. In Z buffering, hidden surface removal is accomplished by reading the destination Z value and then arithmetically comparing the destination Z value to a source Z value to determine acceptability before conditionally writing back the source Z value to the frame buffer.
A read/write turn-around typically involves switching the frame buffer memory bus from reading data to writing data, and then to back to reading data. This results in a latency or delay between issuing a request to read data from a particular address to actually receiving the data from the frame memory. Similarly, once data is to be written back to the frame memory, the frame buffer memory bus must be switched from read to write and accordingly another latency or delay may be experienced as the bus is put in high impedance mode for one or more cycles. These delays can be significant.
To reduce the impact of these latencies, it has been proposed that a number of reads and writes be batched together to decrease the number of read/write turnarounds. The actual length of the batch of reads or writes is limited only by the system's physical hardware.
However, if multiple graphical objects are rendered, there may be a complete or partial overlap of one object with another object. For example, one object may be "in front of" another object when viewed. This may create a consistency problem for batching in that if a first object has yet to be written to the frame memory, then reads for a second overlapping object could use the old, i.e., unmodified data, from the frame buffer. This is an unacceptable error.
For example, in a system having a single pixel at each address, let us say that a single pixel is read during each read cycle and a single pixel is written during each write cycle. Let us assume that the first four pixels being read are associated with one object and the next four pixels being read are associated with another object and the fourth and seventh read pixels are at identical pixel addresses. If the pixel data read at the fourth address has been modified but is yet to be written back to the frame buffer, the pixel data read at the seventh address will be read from erroneous stale data in the frame buffer rather than the modified pixel data.
To solve this problem, graphic systems often prohibit batching reads for one object with reads for another object. Accordingly, all reads and writes for one object are processed before reading any pixel data for another object, to completely avoid the consistency problem. However in modern graphic systems, graphical applications are using smaller and smaller objects to reduce artifacts when rendering curved surfaces and the read/write turnaround penalty is getting proportionally larger with each new generation of memory. Thus, to require a complete writing of a first object before reads for a subsequent object can result in a significant system overhead and unacceptable delays.
Simple techniques have been proposed to detect overlaps in graphic images. One technique proposes comparing the minimal bounding rectangles of respective objects to detect an overlap. However, this technique tends to result in a high false-positive rate and is generally considered to be of little practical use. More sophisticated techniques have been proposed which require extensive computation to determine if respective objects are overlapping. Such techniques are computationally expensive and made even more so because the comparison is not simply of two successive objects, but rather of multiple successive objects.
It has also been proposed to use a cache for the frame buffer memory in combination with rather complex bookkeeping to track dependencies created by overlaps. By utilizing an out-of-order implementation, overlapping objects could be painted faster than non-overlapping objects, in many cases, by avoiding the writing and rereading for an overlapped object in the frame memory prior to reuse of the overlapped pixel data. Instead, the overlapped pixel data can be read directly from the cache and hence read in its current, e.g. modified, form. However, such cache memory requires extensive resources and expense.