In some processor cores of current parallel multiprocessors, numerous threads may be executed concurrently. Furthermore, the threads may be packed together into groups called “warps,” which are executed in a single instruction multiple data (SIMD) fashion, meaning that at any one instant, all threads within a warp are executing the same instruction on their own private data values. If different threads within a warp need to execute different instructions, each thread must execute its desired operation while all other threads that do not wish to execute this operation are idle. This condition, known as divergence, is often undesirable as idling threads are unutilized, thus reducing total computational throughput.
The foregoing parallel multiprocessors are capable of many different applications. For instance, much effort has been made to adapt ray tracing algorithms to work well with such architectures. Ray tracing involves a technique for determining the visibility of an object or objects from a given point, such as, but not limited to an “eye” or “camera” point, by following a ray. While such techniques theoretically can involve testing a vast number of rays against each and every geometric primitive, this is typically not practical. Instead, designers have used various data structures to identify a subset of such primitives to be involved in the testing, thereby reducing required processing.
To accomplish this, objects are typically organized in a tree-structured spatial hierarchy, such as a bounding volume hierarchy (BVH), a kd-tree (k-dimensional tree), or a binary space partitioning (BSP) tree. However, determining how to traverse such a tree efficiently and find the object or objects that are intersected by a given ray may pose a challenge, particularly with single instruction multiple data processing architectures. For example, when formulating ray tracing algorithms on a graphics processing unit (GPU), care must be taken when assigning rays and traversal tests to the various parallel threads of execution, in order to minimize the aforementioned divergence due to the different threads in a warp making different decisions, etc. There is thus a need for addressing these and/or other issues associated with the prior art.