Digital computers are being used today to perform a wide variety of tasks. A primary means for interfacing a computer system with its user is through its graphics display. The graphical depiction of data, through, for example, full motion video, detailed true color images, photorealistic 3-D modeling, and the like, has become a preferred mechanism for presenting complex data to the user. Increasing the performance and/or the realism of interactive three-dimensional images and scenes has become a primary driver for the increasing performance of newer computer systems.
Generally, a desktop computer system equipped to handle 3-D image data includes a specialized graphics processor unit, or GPU, in addition to the traditional CPU. The GPU includes specialized hardware configured to handle 3-D computer-generated objects. In a typical 3D computer generated object, the surfaces of the 3D object are described by data models. The GPU is configured to operate on these data models and their constituent “primitives” (usually mathematically described polygons and polyhedra) that define the shape of the object, the object attributes, and the connectivity and positioning data describing how the objects fit together. Generally, the component polygons and polyhedra connect at common edges defined in terms of common vertices and enclosed volumes. The hardware of the GPU processes the objects, implementing the calculations required to produce the realistic 3-D images. The GPU processing results in the polygons being texture mapped, Z-buffered, and shaded onto an array of pixels, creating the realistic 3D image.
In a typical graphics computer system, most of the processor intensive rendering computations are performed by the GPU included in the computer system. For example, the 3D object data models are “traversed” by a graphics driver software program (e.g., in response to user input) running on both the GPU and CPU of the computer system. Generally, the primitives describing the 3D object are processed by the CPU and sent to the GPU for rendering. For example, a 3D polyhedra model of an object is sent to the GPU as contiguous strips of polygons, comprising a graphics data stream (e.g., primitives, rendering commands, instructions, etc.). This graphics data stream provides the GPU with the information required to render the 3D object and the resulting scene. Such information includes, for example, specular highlighting, anti-aliasing, depth, transparency, and the like. Using this information, the GPU performs all the computational processing required to realistically render the 3D object. The hardware of the GPU is specially tuned to perform such processing quickly and efficiently in comparison to the CPU.
The performance of a typical graphics rendering process as implemented on a graphics computer system is highly dependent upon the performance of the underlying hardware. High performance graphics rendering requires high data transfer bandwidth to the memory storing the 3-D object data and the constituent primitives. Thus, typical prior art GPU subsystems (e.g., GPU equipped graphics cards) typically include a specialized high bandwidth local graphics memory for feeding the required data to the GPU.
A problem with the typical prior art GPU subsystems is the fact that the data transfer bandwidth to the system memory, or main memory, of a computer system is much less than the data transfer bandwidth to the local graphics memory. A GPU subsystem needs to communicate with system memory in order to exchange data with the CPU and interact with programs executing on the CPU. This communication occurs across a graphics bus, or the bus that connects the graphics subsystem to the CPU and system memory. For example, 3-D objects and their primitives need to be transferred from a program executing on the CPU and on system memory into the local graphics memory of the graphics subsystem for rendering. The low data transfer bandwidth of the graphics bus acts as a bottleneck on overall graphics rendering performance.
The problem with respect to the low data transfer bandwidth of the graphics bus constricts the flow of data in both directions. For example, the low data transfer bandwidth of the graphics bus acts as a bottleneck for those applications where data needs to be read back from the graphics subsystem to the CPU. Such applications include, for example, post-transform applications where 3-D object data after transformation needs to be read back to the CPU for use by programs executing on the CPU. Thus, even though the CPU is designed to have a very high data transfer bandwidth to system memory, programs executing on the CPU and the system memory are constricted by the very much lower data transfer bandwidth of the graphics bus, as for example, a real-time 3-D application waits for post-transform information from the graphics subsystem.
Thus, what is required is a solution capable of overcoming the limitations imposed by the limited data transfer bandwidth of a graphics bus of a computer system. What is required is a solution that eliminates the bottleneck imposed by the much smaller data transfer bandwidth of the graphics bus in comparison to the data transfer bandwidth of the GPU to local graphics memory and the CPU to system memory. The present invention provides a novel solution to the above requirements.