In some processor cores of current parallel multiprocessors, numerous threads may be executed concurrently. Furthermore, the threads may be packed together into groups, here called “warps,” which are executed in a single instruction multiple data (SIMD) fashion, meaning that at any one instant, all threads within a warp are executing the same instruction on their own private data values. If different threads within a warp need to execute different instructions, each thread must execute its desired operation while all other threads that do not wish to execute this operation are idle. This condition, known as divergence, is often undesirable as idling threads are unutilized, thus reducing total computational throughput.
The foregoing parallel multiprocessors are capable of many different applications. For instance, much effort has been made to adapt ray tracing algorithms to work well with such architectures. Ray tracing involves a technique for determining the visibility of an object or objects from a given point, such as, but not limited to an “eye” or “camera” point, by following a ray. While such techniques theoretically can involve testing a vast number of rays against each and every geometric primitive, this is typically not practical. Instead, designers have used various data structures to identify a subset of such primitives to be involved in the testing, thereby reducing required processing.
To accomplish this, objects are typically organized in a tree-structured spatial hierarchy, such as a bounding volume hierarchy (BVH), a kd-tree (k-dimensional tree), or a binary space partitioning (BSP) tree. However, determining how to traverse such a tree efficiently and find the object or objects that are intersected by a given ray may pose a challenge, particularly with single instruction multiple data processing architectures. For example, when formulating ray tracing algorithms on a graphics processing unit (GPU), care must be taken when assigning rays and traversal tests to the various parallel threads of execution, in order to minimize the aforementioned divergence due to the different threads in a warp making different decisions, etc.
In the context of ray tracing using a SIMD architecture, each thread may trace a ray through a data structure. In this case, each ray will need to be traversed through one or more nodes until it reaches a node containing more than one primitive, then each ray may be intersected with the set of primitives (e.g. triangles, splines, spheres, etc.) in that node. If the ray intersects a primitive within the node, the ray is complete and an intersection may be shaded (e.g. colored) and more rays may be generated. If no primitive is intersected in the node, the ray begins traversing nodes again, until the ray reaches a non-empty node and again starts intersecting primitives.
During this traversal and intersection cycle, different factors may cause SIMD divergence. For example, rays may hit a primitive and terminate such that the corresponding thread is masked off and inactive for all future traversal and intersection phases. In another case, rays may finish a traversal step at different times and then wait for the remaining threads in the SIMD group to finish traversing before the group may enter the intersection phase. Additionally, rays may intersect primitives bound to different shaders such that when the intersections enter a shade phase, different code is executed. There is thus a need for addressing these and/or other issues associated with the prior art.