1. Field of the Invention
The present invention relates generally to the field of graphics processing and more specifically to a system and method for transferring pre-computed Z-values between graphics processing units (GPUs).
2. Description of the Related Art
A typical computing system includes a central processing unit (CPU), an input device, a system memory, one or more graphics processing units (GPUs), and one or more display devices. A variety of software application programs may run on the computing system. The CPU usually executes the overall structure of the software application program and configures the GPUs to perform specific tasks in the graphics pipeline. Some computing systems include both an integrated (IGPU) and a higher-performance discrete GPU (DGPU). Such a computing system may support a hybrid performance mode in which the IGPU is configured to supplement the performance of the DGPU, thereby increasing the efficiency of the graphics pipeline.
In one approach to implementing a hybrid performance mode, the IGPU runs one image frame ahead of the DGPU, rendering only depth of field values (ignoring all color information) to establish the closest surfaces to the viewer. While rendering, the IGPU maintains the minimum Z-value, which corresponds to the closest depth of field value, for each pixel in the image frame using a two-dimensional array known as a Z-buffer. After the IGPU pre-computes the Z-buffer, a DMA (direct memory access) engine copies the Z-buffer from the IGPU local memory to the system memory and, subsequently, copies the pre-computed Z-buffer from the system memory to the DGPU local memory. The DGPU then renders the image frame with full shading (including color information), using the pre-computed Z-buffer to avoid rendering pixel fragments (i.e., the fragment of each pixel intersected by an object) in the image that would otherwise be occluded by closer geometries in the image being rendered. Ignoring the color information allows the IGPU to efficiently pre-compute the Z-buffer, while starting with the pre-computed Z-buffer allows the DPGU to forestall time-consuming shading operations.
One drawback to this approach, however, is that the size of the pre-computed Z-buffer that the DMA engine copies from the IGPU local memory to the DGPU local memory via the system memory is usually quite large. For example, for a 1600-by-1200 pixel image frame, the pre-computed Z-buffer may include nearly 8 MB of data. Transferring this large volume of data may strain the system memory bandwidth, thereby becoming a bottleneck in the graphics pipeline and hindering overall system performance. In addition, transferring the Z-buffer with a DMA engine oftentimes invalidates the Z-buffer compression techniques that the DGPU uses to efficiently process the Z-buffer. As a result, the DGPU has to use an uncompressed Z-buffer while rendering, which reduces the performance of the DGPU.
In another approach, the IGPU pre-computes the Z-buffer for an image frame, specialized hardware transfers the Z-buffer directly from the IGPU local memory to the DGPU local memory, and the DGPU renders the image frame using the pre-computed and compressed Z-buffer. While using this solution avoids system memory bandwidth limitations and retains Z-buffer compression, not all computing systems include the specialized hardware used to directly transfer the Z-buffer.
As the foregoing illustrates, what is needed in the art is a more efficient and flexible technique for transferring pre-computed Z-values between GPUs.