1. Field of the Invention
The present invention relates generally to the field of graphics processing and more specifically to a system and method for using cull streams for fine-grained rendering predication.
2. Description of the Related Art
A typical computing system includes a central processing unit (CPU) and a graphics processing unit (GPU). Some GPUs are capable of very high performance using a relatively large number of small, parallel execution threads on dedicated programmable hardware processing units. The specialized design of such GPUs usually allows these GPUs to perform certain tasks, such as rendering three-dimensional (3D) scenes, much faster than a CPU. However, the specialized design of these GPUs also limits the types of tasks that the GPU can perform. The CPU is typically a more general-purpose processing unit and therefore can perform most tasks. Consequently, the CPU usually executes the overall structure of the software application and configures the GPU to perform specific tasks in a graphics pipeline.
One task that may be performed when transforming 3D scenes into two-dimensional (2D) images is culling. In a typical graphics scene, a substantial percentage of the graphics primitives sent by the application to the GPU produce no effect in the image being rendered. Some primitives may not be inside the view volume, some primitives may be back-facing, and some primitives may be occluded by other primitives. Typically, more than 60% of the primitives fall into one of the above categories. Culling the scene is thus employed to minimize the burden on the processor and to eliminate unnecessary rendering of primitives that are not visible. In a typical GPU, this work is eliminated as early as possible, but culling necessarily consumes some processing time and results in bubbles in the GPU work flow that reduce overall processing efficiency.
One type of culling technique is occlusion culling, where the GPU determines how many primitives of an object pass a Z-test and a stencil test. One drawback of this approach is that the GPU must transmit the results of these tests back to the CPU, which is ahead in the command stream relative to the GPU.
To overcome this drawback, the technique of predicated rendering (conditional rendering) was introduced. In predicated rendering, the result of an occlusion query with respect to one object is used to predicate rendering of some other object. For example, the CPU may render a bounding volume of a 3D object, and if any part of the bounding volume passes both the Z-test and stencil test (i.e., part of the bounding volume is visible), then the actual 3D object is rendered by the GPU. If no samples of the bounding volume pass both the Z-test and the stencil test, then the 3D object is not rendered. The bounding volume of the object is the “predicate” for the predicated rendering operation.
Although predicated rendering functions as intended, this technique has a significant limitation in that an occlusion query provides only one bit of information to the system—whether or not the entire 3D object is occluded. Developers and designers are increasingly creating larger and more complex objects that are to be rendered with a single draw call or only a few draw calls. Because of the complexity of the objects, the results of predicated rendering are very coarse. For example, if an object contains 1000 primitives, and only one primitive is visible, then the entire 3D object is rendered because the occlusion query provides only one bit of information—whether or not any part of the 3D object, no matter how little, is visible. If the coarseness of occlusion culling ultimately results in most or all of the 3D objects in a graphics scene being deemed visible, then occlusion culling does nothing more than add cost via a more complicated command stream with no corresponding benefit to the overall performance. In addition, if a portion of the bounding volume of an object is visible, but none of the object is visible, then the full object would be rendered, although none of the object is visible, because the object's bounding volume is visible.
In an attempt to overcome the limitations of predicated rendering, smaller, sub-bounding volumes of the overall bounding volume of the 3D object may be used. With such an approach, each sub-bounding volume is queried for culling, and only the subparts of the 3D object that are visible are rendered. Although using sub-bounding volumes results in finer granularity, additional overhead is required to generate and render each of the sub-bounding volumes, thereby decreasing overall performance. Furthermore, similar to above, a portion of a sub-bounding volume may pass the culling query, causing subparts of the object to be rendered, although none of the subparts of the object are visible.
Accordingly, there remains the need in the art for a finer-grained predicated rendering technique that does not add costly overhead to the command stream.