1. Field of the Invention
The present invention generally relates to three-dimensional (3D) graphics processing, and, more particularly, to fine-grained traversal for ray tracing.
2. Description of the Related Art
As used in the field of computer graphics, ray tracing is a technique for generating a realistic graphic image by tracing the path of light through the pixels in an image plane, such as the surface of a display device. Each path of light (or ray) is oriented to pass through one of the pixels in the image plane. Ray tracing then simulates the effects as each ray encounters objects in a three-dimensional (3D) graphics environment. As each ray encounters objects in the 3D graphics environment, the ray may reflect, refract, scatter, or disperse at the point of contact with each object.
Typically, the calculations to perform ray tracing are computationally intensive. In order to improve performance, ray tracing may be accelerated by tracing a set of rays simultaneously using a highly parallel computing device such as graphics processing unit (GPU). Such parallel processing devices include single-instruction multiple-thread (SIMT) and single-instruction multiple-data (SIMD) processors that execute each instruction on a group of parallel threads or parallel data lanes. These parallel processors benefit because the same instructions are performed on various data sets in a highly parallel manner. Alternatively, parallel execution of a large number of generally synchronized threads is performed, using a common instruction unit where different threads may follow divergent execution paths through a given thread program.
For example, a GPU with 32 computational paths could process 32 rays simultaneously by assigning one ray to each of the 32 computational paths. One problem with this approach is that processing for one ray may complete in a relatively short period of time while processing for another ray may complete in a relatively long period of time. As a result, some of the 32 computational units may complete processing their assigned rays and may enter an idle state pending completion of processing for all 32 rays. In such a case, performance is reduced because the idle computational units wait for other computational units to complete processing, and thus do not process rays or perform other computational tasks.
As the foregoing illustrates, what is needed in the art is an improved technique for ray tracing.