The present invention relates to the field of computer graphics. Many computer graphic images are created by mathematically modeling the interaction of light with a three dimensional scene from a given viewpoint. This process, called rendering, generates a two-dimensional image of the scene from the given viewpoint, and is analogous to taking a photograph of a real-world scene.
As the demand for computer graphics, and in particular for real-time computer graphics, has increased, computer systems with graphics processing subsystems adapted to accelerate the rendering process have become widespread. In these computer systems, the rendering process is divided between a computer's general purpose central processing unit (CPU) and the graphics processing subsystem. Typically, the CPU performs high level operations, such as determining the position, motion, and collision of objects in a given scene. From these high level operations, the CPU generates a set of rendering commands and data defining the desired rendered image or images. For example, rendering commands and data can define scene geometry, lighting, shading, texturing, motion, and/or camera parameters for a scene. The graphics processing subsystem creates one or more rendered images from the set of rendering commands and data.
To maximize rendering performance, the graphics processing subsystem may include two or more graphics processing units (GPUs) operating in parallel. The graphics processing units can divide the rendering workload in a number of different ways. For example, different portions of an image can be rendered in parallel by different GPUs. The portions are then combined to produce a complete rendered image. In another example parallel rendering scheme, each GPU renders one image in a sequence of images.
Transferring rendering commands and data with a CPU to two or more GPUs is one difficulty arising from parallel rendering schemes. In many parallel rendering schemes, the same rendering commands and data need to be distributed to all of the GPUs; however, the CPU typically communicates rendering commands and data via a CPU bus to each GPU separately. Thus, the bandwidth required to operate multiple GPUs scales linearly with the number of GPUs. Because the bandwidth of the CPU bus is limited, a system with two or more GPUs operating in parallel will saturate the CPU bus. In these situations, the CPU bus is the limiting factor in overall graphics performance, and additional GPUs will not provide any performance improvement. Furthermore, having the CPU communicate the same rendering commands and data with each GPU separately wastes CPU cycles; for example, requiring the CPU to write the same rendering commands and data several times.
Therefore, it is desirable to have an efficient system and method for transferring rendering commands and data between the CPU and multiple graphics processing units operating in parallel. It is further desirable to eliminate the CPU bus as a potential graphics performance bottleneck in operating multiple graphics processing units. It is also desirable to reduce the number of wasteful CPU cycles needed to communicate with multiple graphics processing units in the graphics processing subsystem.