The present invention relates in general to updating pixel data in a frame buffer and in particular to a raster operations unit with interleaving of read and write requests using PCI Express (PCI-E) or a similar communication link.
Graphics processors are used to render images in many computer systems. In a typical rendering process, the graphics processor receives primitives (e.g., points, lines, and/or triangles) representing objects in a scene. In accordance with instructions provided by an application program, the graphics processor transforms each primitive to a viewing space, then determines which pixels of the image are covered by the primitive. For each pixel that is covered, the graphics processor computes a color and depth (or Z) value, e.g., by executing a pixel shader program provided by the application program. The color and depth values computed for each pixel are provided to a raster operations (ROP) unit, which stores the image pixels in a frame buffer. As the ROP unit receives new depth and color values for a pixel, it compares the new depth value to a previous depth value stored in the frame buffer and determines whether to write new data for that pixel to the frame buffer. If new data is to be written, the ROP unit updates the depth and color values in the frame buffer based on the new data. A typical ROP unit can perform a variety of color blending operations between existing pixels and new pixels.
The ROP unit generates many data transfer requests to and from the frame buffer for each image. To execute the process described above, each time the ROP unit receives a new pixel, it reads the old pixel (at least the depth value) from the frame buffer. If the pixel is to be changed, the color must also be read from the frame buffer so that it can be modified, and in the end, the modified color and depth are written back to the frame buffer. In some graphics systems, bandwidth between the ROP and the frame buffer can become a bottleneck, limiting system performance.
In one common approach to eliminating this bottleneck, the frame buffer is implemented in a memory device that is local to the graphics processor and dedicated to graphics use (referred to herein as a “graphics memory”). In these systems, the graphics processor is usually connected to the frame buffer by a wide, high-speed dedicated data path. This approach can be relatively expensive, requiring a large number of input/output (I/O) pins and consequently a large chip area.
For some low-cost or physically compact systems, it is desirable to avoid the extra cost associated with providing graphics memory. In these systems, sometimes referred to as “unified memory architectures” (UMA), the graphics processor generally uses an area of system memory to store the frame buffer. In a UMA system, the ROP unit communicates with the frame buffer using the bus that connects the graphics processor to the rest of the computer system.
Conventionally, buses for graphics devices have been implemented using protocols such as Peripheral Component Interconnect (PCI) or Accelerated Graphics Port (AGP). These protocols provide a “reversible” data path, with data moving “upstream” from the graphics processor to system memory via the same physical path as data moving “downstream.” Data can move in only one direction at a time. Generally, some amount of overhead is associated with bus “turnaround,” i.e., switching between upstream and downstream data transfers. For the repeated read-modify-write sequence of operations on the frame buffer performed by a typical ROP unit, this overhead can be considerable.
Consequently, in UMA systems using conventional bus protocols, the ROP unit is usually designed to minimize the number of times the bus is turned around. In one typical implementation, the ROP unit receives pixels to be processed in groups (e.g., 256 or 512 pixels). The ROP unit executes all of the read operations for the group and defers the writeback operations for the group until the last read is completed. The bus is turned around only twice per group (from read to write, then from write to read), reducing the overhead.
More recently, the PCI Express (PCI-E) “bus” protocol has been introduced. Unlike conventional buses, which provide reversible data transfer paths that are often shared by multiple devices, PCI-E provides each device with a dedicated “bidirectional” link that includes separate upstream and downstream data paths. Thus, on a PCI-E link data can flow in both directions at once. ROP implementations that are optimized for reversible data paths, which allow data to flow in only one direction at a time, use the PCI-E link in a relatively inefficient manner. It would, therefore, be desirable to provide a ROP unit that made more efficient use of a link protocol such as PCI-E.