1. Field of the Invention
Embodiments of the present invention relate generally to video processing and more specifically to performing anti-aliasing operations with multiple graphics processing units.
2. Description of the Related Art
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Anti-aliasing processing typically occurs during the rendering stage of video processing and is generally used to diminish jaggies, which are stair-like lines that appear at places in an image where there should be smooth, straight lines or curves. One anti-aliasing technique is to obtain several samples of each displayed pixel and compute an average of the samples to determine the color of the pixel. To lessen the amount of time required to perform this technique, one approach is to distribute the computation steps to one or more graphics processing units (“GPUs”).
FIG. 1A illustrates one eight-times-sampling anti-aliasing (“8×AA”) operation using two GPUs in a graphics system. Specifically, instead of performing anti-aliasing on 8 samples of each displayed pixel by a single GPU, each of two GPUs performs four-times-sampling anti-aliasing operation (“4×AA”) on 4 samples per pixel in parallel as represented by blocks 100 and 102. As shown in FIG. 1B, the 4 samples that GPU0 operates on are located around point 120, and the 4 different samples that GPU1 operates on are located around point 122, which corresponds to point 120 plus offset 124.
Suppose GPU0 is the primary GPU in the graphics system, and GPU1 is the secondary GPU. Before GPU0 can transmit its local frame buffer to the display device in block 108 of FIG. A, or otherwise referred to herein as “scanning out,” GPU1 transfers the output of block 102 into a temporary buffer in a direct memory access (“DMA”) copy operation in block 104. Such an operation is commonly referred to as a “blit.” GPU1 effectively “pushes” the results of the 4×AA operation from its local frame buffer to the temporary buffer. GPU0 then needs to pull the data from the temporary buffer and combine the data with the content of its entire local frame buffer in block 106. These two operations are also commonly referred to as a “pull and blend” operation. In this implementation, because both GPU0 and GPU1 access the same memory location of the temporary buffer, GPU0 needs to wait until GPU1 completes its blitting operation before it can proceed with its pull and blend operation.
In an alternative implementation, the push operation in block 104 and the pull and blend operation in block 106 shown in FIG. 1A can potentially overlap. FIG. 1C is a conceptual diagram of a temporary buffer bank and its interactions with the local frame buffers of two GPUs. Temporary buffer bank 130 corresponds to a memory block, which includes two partitions, bank a and bank b. In data transfer 132, GPU1 blits data from its local frame buffer to bank b. In data transfer 134, GPU0 pulls and blends data that have been previously stored in bank a with data from its local frame buffer. Since bank a and bank b occupy different memory locations, data transfer 132 and data transfer 134 can proceed independently. After the completion of the data transfers, banks a and b swap, and the process of pushing and pulling and blending repeats. Even with the use of a swapping temporary buffer bank in this implementation, only a single GPU, GPU0, performs the pulling and blending operation in a multi-GPU system.
As the foregoing illustrates, utilizing only a single GPU in a multi-GPU system to perform the pull and blend operations serializes the anti-aliasing processing for the overall system. Thus, what is needed is a way to increase the efficiency of these pull and blend operations.