1. Field of the Invention
The present invention generally relates to graphics processing and, more specifically, to techniques for locally modifying draw calls.
2. Description of the Related Art
In conventional graphics processing systems, to render a graphics scene, a software application first sets up the scene. In so doing, the software application defines the draw calls that are to be executed to render the graphics scene. Each draw call includes graphics state information. During setup, the software application usually pre-calculates the graphics state information for each draw call and stores this information in memory.
One drawback to this approach, however, is that the size of the graphics state data required to render many graphics scene is quite large. For example, to render a scene comprised of 1,000,000 sunflowers each of which includes 1000 objects, the central processing unit (CPU) may pre-calculate 1,000,000,000 sets of graphics state information. As the complexity of the graphics scene increases, the size of the graphics state data also increases. The volume of data may strain, and possibly exceed, the available system memory. Further, transferring this data from the CPU to the graphics processing unit (GPU) may exceed the system memory bandwidth, thereby becoming a bottleneck in the graphics processing pipeline and hindering overall system performance.
In another approach, the amount of system memory used to render a graphics scene can be reduced by delaying one or more of the calculations for some draw calls until this information is requested by the GPU during rendering. For example, to determine which objects to render during culling, the GPU and CPU may work together to determine the draw calls and the included graphics state information. More specifically, the GPU passes relevant information to the CPU, then the CPU processes this information, then the CPU passes the results back to the GPU, and finally the GPU starts to render. Unfortunately, this approach requires synchronization between the GPU and CPU. The CPU may be stalled waiting for the GPU to generate the relevant information (e.g., rendering previous frames). And the GPU may be stalled waiting for the CPU to process this information. Thus, although such a technique reduces memory usage, synchronization operations can stall the graphics processing pipeline, thereby negatively impacting overall system performance. Furthermore, this approach increases the amount of data transferred between the GPU and CPU, exacerbating any system memory bandwidth problem.
As the foregoing illustrates, what is needed in the art is a more effective way to generate draw calls and the graphics state information associated with those draw calls.