Ray tracing refers to a family of techniques for determining point-to-point visibility in a geometric scene, typically for the purpose of synthesizing an image. For example, an image of a virtual scene may be rendered by conceptually locating a virtual eyepoint and a virtual computer screen in the scene, then creating rays, called primary rays, from the eyepoint through every pixel of the screen. By computing the intersection of the primary rays with every object in the scene (e.g. tracing the rays), and selecting the first object intersected by every ray, the visible object at each pixel and color the pixel may be determined accordingly.
Computing this color, a process called shading, may involve tracing additional rays, called secondary rays. For example, shading may be utilized to determine the color of objects reflected in shiny objects (e.g. reflection rays) or to determine whether a point is in shadow by computing the visibility between a light source and a point being shaded (e.g. shadow rays). Shadow rays are a special case, since they only need to determine whether a ray segment intersects any object, and not which of multiple possible intersecting objects the ray intersects first.
In practice, ray tracing systems and techniques do not intersect every ray with every object in the scene. Instead, objects are organized into a spatial data structure, occasionally a grid, but most often a tree such as a bounding volume hierarchy (BVH) or a k-dimensional (k-d) tree. Each ray is then traversed through the tree by determining which tree nodes the ray intersects. Only objects contained by intersected nodes need to be tested for intersection with the ray.
Traversing a ray through a tree data structure, such as a BVH or k-d tree, involves recursively visiting one child of a node, then, if the ray does not hit anything in that child, visiting another child. Efficient implementations usually keep an explicit stack of “children not yet visited” in an iterative loop rather than use a recursive formulation. Either case presents a potential problem for implementation on parallel processing architectures such as graphics processing units (GPUs). In parallel processing architectures, each thread tracing a ray may keep a stack of variable and potentially unlimited size. Since modern GPUs process thousands of threads at once, this possibly incurs significant storage and bandwidth costs.
One prior art implementation, suggested a “stackless” traversal algorithm for k-d trees is called k-d restart. This approach avoids the need to keep a stack by continuously restarting traversal from the root. When a ray is intersected with primitive(s) in a leaf node, the ray either hits a primitive, in which case traversal is terminated, or it fails to hit any primitive, in which case the ray origin is advanced to the point where it exits the node, and traversal begins anew at the top of the tree for the new, shortened ray. Clearly, k-d restart requires much more work and higher memory bandwidth than straightforward k-d tree traversal, since nodes near the top of the tree will be fetched many times for a single ray.
Another prior art implementation independently realized that k-d tree traversal could be made “stackless” by using a “short stack.” A short stack has stack semantics (e.g. pop, push) but only keeps a small, constant number k of entries. If more than k nodes are pushed on the stack, the oldest ones are simply discarded. An attempt to pop that would normally return a node which has been discarded initiates a k-d restart (advancing the ray origin and restarting traversal at the root node).
Unlike k-d tree nodes, nodes of some other spatial hierarchies (e.g. BVHs) may overlap, so there is no way to advance the ray until it exits this node without potentially advancing past some primitives in another node that the ray should have intersected. Without BVH restart, stackless or short stack BVH traversal may not be possible, forcing a full stack per thread, where a full stack typically requires one 32-bit pointer per level. The latency and bandwidth consumption from storing or spilling the stack into slow and/or off-chip memory destroys performance and makes BVH traversal uncompetitive on a parallel processing architecture such as a GPU.
There is thus a need for addressing these and/or other issues associated with the prior art.