Generally, a computer system suited to handle 3D image data includes a specialized graphics processor unit, or GPU, in addition to a traditional CPU (central processing unit). The GPU includes specialized hardware configured to handle 3D computer-generated objects. The GPU is configured to operate on a set of data models and their constituent “primitives” (usually mathematically described polygons) that define the shapes, positions, and attributes of the objects. The hardware of the GPU processes the objects, implementing the calculations required to produce realistic 3D images on a display of the computer system.
The performance of a typical graphics rendering process is highly dependent upon the performance of the system's underlying hardware. High performance real-time graphics rendering requires high data transfer bandwidth to the memory storing the 3D object data and the constituent primitives. Thus, more expensive prior art GPU subsystems (e.g., GPU equipped graphics cards) typically include larger (e.g., 128 MB or larger) specialized, expensive, high bandwidth local graphics memories for feeding the required data to the GPU. Less expensive prior art GPU subsystems include smaller (e.g., 64 MB or less) such local graphics memories, and some of the least expensive GPU subsystems have no local graphics memory.
A problem with the prior art low-cost GPU subsystems (e.g., having smaller amounts of local graphics memory) is the fact that the data transfer bandwidth to the system memory, or main memory, of a computer system is much less than the data transfer bandwidth to the local graphics memory. Typical GPUs with any amount of local graphics memory need to read command streams and scene descriptions from system memory. A GPU subsystem with a small or absent local graphics memory also needs to communicate with system memory in order to access and update pixel data including pixels representing images which the GPU is constructing. This communication occurs across a graphics bus, or the bus that connects the graphics subsystem to the CPU and system memory.
In one example, per-pixel Z-depth data is read across the system bus and compared with a computed value for each pixel to be rendered. For all pixels which have a computed Z value less than the Z value read from system memory, the computed Z value and the computed pixel color value are written to system memory. In another example, pixel colors are read from system memory and blended with computed pixel colors to produce translucency effects before being written to system memory. Higher resolution images (images with a greater number of pixels) require more system memory bandwidth to render. Images representing larger numbers of 3D objects require more system memory bandwidth to render. The low data transfer bandwidth of the graphics bus acts as a bottleneck on overall graphics rendering performance.
Thus, what is required is a solution capable of reducing the limitations imposed by the limited data transfer bandwidth of a graphics bus of a computer system. What is required is a solution that ameliorates the bottleneck imposed by the much smaller data transfer bandwidth of the graphics bus in comparison to the data transfer bandwidth of the GPU to local graphics memory. The present invention provides a novel solution to the above requirement.