The present invention relates in general to graphics processing systems, and in particular to graphics processing systems with multiple processors connected in a ring topology such that pixel data can be transferred from any processor to any other processor.
Graphics subsystems are designed to render realistic animated images in real time, e.g., at 30 or more frames per second. These subsystems are most often implemented on expansion cards that can be inserted into appropriately configured slots on a motherboard of a computer system and generally include a dedicated graphics processing unit (GPU) and dedicated graphics memory. The typical GPU is a highly complex integrated circuit device optimized to perform graphics computations (e.g., matrix transformations, scan-conversion and/or other rasterization techniques, texture blending, etc.), write the resulting pixels to the graphics memory, and deliver the pixels in real time to a display device. The GPU operates in response to commands received from a driver program executing on a central processing unit (CPU) of the system.
To meet the demands for realism and speed, some GPUs include more transistors than typical CPUs. In addition, graphics memories have become quite large in order to improve speed by reducing traffic on the system bus; some graphics cards now include as much as 256 MB of memory. But despite these advances, a demand for even greater realism and faster rendering persists.
Consequently, some manufacturers have developed “multi-chip” graphics subsystems in which two or more GPUs, either on the same card or on two or more different cards, operate in parallel. Parallel operation substantially increases the number of rendering operations that can be carried out per second without requiring significant advances in GPU design. To minimize resource conflicts between the GPUs, each GPU is generally provided with its own dedicated memory area, including a display buffer to which the GPU writes pixel data it renders.
In a multi-chip system, two or more GPUs can be operated to render images cooperatively for the same display device; in this “distributed” rendering mode, rendering tasks are distributed among the GPUs. Tasks may be distributed in various ways. For example, in a “split frame rendering” mode, each GPU is instructed to render pixel data for a different portion of the displayable image, such as a number of lines of a raster-based display. The image is displayed by scanning out the pixel data from each GPU's display buffer and selecting a pixel generated by one or another of the GPUs depending on screen position. As another example, in an “alternate frame rendering” mode, each GPU is instructed to render pixel data for a different image in a temporal sequence (e.g., different frames of an animated image such as a 3D video game). In this mode, a smooth animation speed of about 30 frames per second can be provided by two GPUs that each render images at 15 Hz.
Existing display devices are generally configured to receive data for each screen pixel serially through one interface. Consequently, the multichip graphics system generally needs to route all of the pixel data to a single path for delivery. For instance, one GPU (referred to herein as a “master” GPU) might be connected to the monitor interface, with all other GPUs delivering their data to the master GPU via various communication paths that may include bus connections and/or dedicated point-to-point links between two GPUs.
Some multichip systems are created by interconnecting two or more single-chip graphics cards in a unidirectional daisy chain and connecting a monitor to one of the cards. If each card provides a connector for a monitor, it is not possible to identify a master GPU until the system is built and the monitor connected. For instance, if the user is confronted with two cards, each of which presents an identical monitor connector, the user might connect the monitor to either card. If the display is connected to a GPU that cannot receive data from another GPU, the benefits of having two GPUS may be lost. Further, as the number of GPUs and possible locations for monitor connections increases, the likelihood that the user will correctly identify the best location (i.e., the location at the receiving end of the daisy chain) to connect a monitor decreases.
In other multichip systems, two or more single-chip graphics cards are connected in a bidirectional daisy chain. If there are only two GPUs, either GPU can receive data from the other, and the user may connect a monitor to either card without losing the benefits of having two GPUs. If, however, there are more than two GPUs, the GPUs that are not at either end of the chain cannot operate as masters to all of the other GPUs. As in the unidirectional case, the likelihood that the user correctly identifies the best location to connect a monitor decreases with the number of GPUs.
It would therefore be desirable to provide multichip systems in which the GPUs can automatically be configured to support distributed rendering operations regardless of where a monitor is connected.