The present invention relates generally to graphics processing subsystems with multiple processors and in particular to coherence of displayed images for split frame rendering in a multiprocessor graphics system.
Graphics processing subsystems are designed to render realistic animated images in real time, e.g., at 30 or more frames per second. These subsystems are most often implemented on expansion cards that can be inserted into appropriately configured slots on a motherboard of a computer system and generally include one or more dedicated graphics processing units (GPUs) and dedicated graphics memory. The typical GPU is a highly complex integrated circuit device optimized to perform graphics computations (e.g., matrix transformations, scan-conversion and/or other rasterization techniques, texture blending, etc.), write the resulting pixels to the graphics memory, and deliver the pixels in real time to a display device. The GPU is a co-processor that operates in response to commands received from a driver program executing on a different processor, generally the central processing unit (CPU) of the system.
To meet the demands for realism and speed, some GPUs include more transistors than typical CPUs. In addition, graphics memories have become quite large in order to improve speed by reducing traffic on the system bus; some graphics cards now include as much as 256 MB of memory. But despite these advances, a demand for even greater realism and faster rendering persists.
As one approach to meeting this demand, some manufacturers have begun to develop “multi-chip” (or multi-processor) graphics processing subsystems in which two or more GPUs, usually on the same card, operate in parallel. Parallel operation substantially increases the number of rendering operations that can be carried out per second without requiring significant advances in GPU design. To minimize resource conflicts between the GPUs, each GPU is generally provided with its own dedicated memory area, including a display buffer to which the GPU writes pixel data it renders.
In a multi-chip system, the processing burden may be divided among the GPUs in various ways. For example, in a “split frame rendering” mode, (also referred to herein as “spatial parallelism”), each GPU is instructed to render pixel data for a different portion of the displayable image, such as a number of lines of a raster-based display. The image is displayed by scanning out the pixel data from each GPU's display buffer in an appropriate sequence. As a more concrete example, a graphics processing subsystem may use two GPUs to generate a displayable image consisting of M rows of pixel data; the first GPU can be instructed to render rows 1 through P, while the second GPU is instructed to render rows P+1 through M. In some multi-processor systems, the value of P can be dynamically modified to balance the load.
Multi-chip graphics systems present a variety of problems, among which is “frame coherence.” In a single-processor system, the GPU typically has a rendering module that generates image data and a scanout module that reads out pixels of the most recently rendered image to a display device. The pixel buffer is double-buffered, with “front” and “back” frame buffers that each provide storage for a complete image. The scanout module reads pixels for the current image from the front frame buffer while the rendering module writes pixels for the new image to the back frame buffer. Once rendering of the new image is complete, the rendering module notifies the scanout module, and at the next appropriate opportunity (e.g., at the end scanout of a complete frame), the buffers are flipped so that the back frame buffer becomes the front frame buffer and is scanned out while the former front frame buffer becomes the back frame buffer and receives data for a subsequent image.
In a multi-processor system implementing split-frame rendering, it is not guaranteed that all of the GPUs will finish rendering their portions of the new image at the same time. If each GPU simply executes a buffer flip whenever it finishes its portion of the image, different portions of the displayed images will tend to become unsynchronized, leading to tearing and other visual artifacts.
One solution is to attempt to dynamically balance the load, e.g., by modifying the value of P as noted above, so that the GPUs will finish at approximately the same time. However, it is difficult to maintain perfect balance where the image complexity is not static, as is usually the case for animated images.
Therefore, techniques for ensuring coherence of displayed images among multiple GPUs performing split-frame rendering in the presence of load imbalances would be desirable.