1. Field of Invention
The present invention relates generally to the field of 3D computer graphics rendering, and more particularly, to ways of and means for improving the performance of parallel graphics processes running on 3D parallel graphics processing systems supporting the decomposition of 3D scene objects among its multiple graphics processing pipelines (GPPLs).
2. Brief Description of the State of Knowledge in the Art
Applicants' U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, incorporated herein by reference, in its entirety, discloses diverse kinds of PC-level computing systems embodying different types of parallel graphics rendering subsystems (PGRSs) with graphics processing pipelines (GPPLs) generally illustrated in FIG. 1. The multi-pipeline architecture of such systems can be realized using GPU-based GPPLs of classical design, as shown in FIG. 2A, or alternatively, using more advanced GPU-based GPPLs, compliant with the DirectX 10 standard, as shown in FIG. 2C. Alternatively, the multi-pipeline architecture of such systems can be realized using multi-core CPU based GPPLs as shown in FIG. 2C.
In general, such graphics-based computing systems support multiple modes of graphics rendering parallelism across their GPPLs, including time, image and object division modes, which can be adaptively and dynamically switched into operation during the run-time of any graphics application running on the host computing system. While each mode of parallel operation has its advantages, as described in U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, supra, the object division mode of parallel operation is particularly helpful during the running of interactive gaming applications because this mode has the potential of resolving many bottleneck conflicts which naturally accompany such demanding applications.
During the object division mode of parallel operation, supported on a parallel graphics rendering system, for example, of the type disclosed in Applicant's U.S. application Ser. No. 11/897,536 filed Aug. 30, 2007, objects within a 3D scene (i.e. graphics data and commands representative thereof) are (i) automatically decomposed based on a specified criteria, and assigned/designated to particular GPUs, and (ii) distributed to the assigned/designated GPUs, so that the GPUs can render partial images of the 3D scene, based on the assigned/designated objects distributed thereto during parallel rendering operations, and ultimately, for these partial image fragments to be re-composited in a final color frame buffer (FB) of the primary GPPL, for display on one or more visual display devices.
During conventional image recomposition processes, supported on parallel graphics rendering platforms operating in the object-division mode of parallelism, the pixel depth or z values of objects's images within the 3D scene must be analyzed/compared, during each image frame, against (i) the pixel depth values of other objects' images (which may be occluding a particular object during rendering), as well as (ii) the rear or background clipping plane represented within the 3D scene. This depth-based image recomposition process is illustrated in FIGS. 1E1, 1E2 and 1E3, and illustrates how local depth maps of objects assigned to particular GPUs are constructed within each GPU, and are used during the recomposition of partial image fragments generated within the color frame buffer of each GPU during the final stage of the object-division (OD) based image recomposition process.
As shown in FIG. 1E1, in conventional prior art Object Division, shows a simple scene, comprising of three objects, A,B and C. An exemplary decomposition of this scene can be done by sending object A for rendering to GPU 1, and objects B and C to GPU 2. FIGS. 1E2 and 1E3 show the color and Z buffers created by prior art Object Division method. From the given View Point object B (of GPU 2) is obstructed by object A (of GPU 1). Both Z-buffers, of GPU 1 and GPU 2, create local depth maps, each map constructed from objects designated to the GPU. Each GPU is unaware of objects rendered by the other GPU, therefore such objects are not reflected in the Z-buffer of the GPU.
Clearly, use of the object-division mode of graphics parallelism has a number of important advantages over the other methods of parallel graphics rendering, for example: (i) responsiveness to user interface inputs; (ii) parallelization of the entire 3D graphics pipeline including the vertex as well as pixel parts thereof; (iii) the reduction of CPU-GPU transfer load; and (iv) the reduction of GPU memory requirements. However, the OD mode of graphics parallelism suffers from a number of inherent shortcomings and drawbacks.
In particular, the object-division mode of parallelism requires a complex and intensive process of merging a plurality of partial image fragments buffered in color frame buffers (FBs), utilizing depth-based information stored in the Z buffers of the GPUs, involving in depth-based comparisons on a pixel-by-pixel basis, resulting in substantial time delays, significant bandwidth consumption, and high hardware costs.
Also, objects being rendered at each GPU, that are obstructed by objects rendered by other GPUs, are processed for rendering (i.e. drawn) as if these objects were visible. Although these redundant portions are eliminated during the final image re-composition process, using depth-based comparisons, such redundant processing operations greatly decreases the efficiency of the object-division mode of parallelism.
When the anti-aliasing (AA) mode is operating during the object-division mode of parallelism, each GPU performs the correct anti-aliasing of its image fragments. However, some objects that are anti-aliased with their current background will become extrinsic to their new background when composed into the final image.
In many graphics applications, there are different lighting sources, and multi-pass applications must render the same scene geometry several times (i.e. passes), typically for different lighting calculations. Thus, the final color of a pixel of an object will be determined by blending together the results of all of the partial rendering passes. When using the object-division mode of parallelism, this multi-pass rendering increases the complexity of the image re-composition process due to the additional dependency on the stencil buffer, which operates on top of the Z buffer within each GPU.
In view, therefore of the above, there is a great need in the art for an improved method of and apparatus for carrying out parallel 3D graphics processing, while avoiding the shortcomings and drawbacks of the prior art apparatus and methodologies.