The present invention relates to systems and methods for traversing hierarchical structures, and more particular to systems and methods for traversing treelet-composed hierarchical structures.
Hierarchical structures, such as logical tree structures, are known in many technical fields, and are employed to organize information in a logical form to facilitate storage and retrieval of the information. In a typical implementation, the highest node or “root” of the logical tree includes the most general information, with descendant nodes (i.e., child nodes, grandchild nodes, etc. moving away from the root node) providing additional detail as to a particular aspect of the information represented by the tree structure. It is, or course, desirable to navigate through the tree via the shortest path and/or in the shortest amount of time in order to store or retrieve information, and node traversal techniques for minimizing the time to perform these operations occupy engineers and scientists from a variety of different fields.
In the areas of graphics processing and rendering, ray tracing is a field which uses hierarchical structures for organizing information. Ray tracing involves a technique for determining the visibility of a primitive from a given point in space, for example, an eye, or camera perspective. Primitives of a particular scene which are to be rendered are typically located in nodes, and the nodes organized within a hierarchical tree. Ray tracing involves a first operation of “node traversal,” whereby nodes of the tree are traversed in a particular manner in an attempt to locate nodes having primitives that may intersect a ray, and a second operation of “primitive intersection,” in which a ray is intersected with one or more primitives within a located node to produce a particular visual effect. The hierarchical structure together with the primitives (together referred to as scene data herein) can be very large, and generally does not fit into a reasonably-sized cache.
In advanced rendering algorithms such as global illumination methods, most of the rays are incoherent. Rays can be considered coherent when it's possible to statically arrange the rays in “groups” so that most of the rays in a group access roughly the same parts of the scene data (and thus memory). In these cases caches can be effective, since the working set of a group of rays is small. However, groups of incoherent rays tend to diverge during hierarchical structure traversal, and the memory accesses are not localized, and therefore caches no longer help.
Navratil et al. in “Dynamic Ray Scheduling to Improve Ray Coherence and Bandwidth Utilization” proposes a solution to address this problem, in which scene data is partitioned into treelets of a hierarchical structure, with each treelet assigned to a queue. Whenever a ray moves to a different treelet during traversal, its processing is suspended and the ray and corresponding traversal state (collectively referred to as ray-state herein) are pushed into its respective queue. Once the treelet has been fetched into a L1 cache of the processing element operating upon the ray-state, almost all scene data requests are serviced from the cache before moving to the next queue. As a result, a very significant reduction in scene data-related memory traffic is made possible.
A difficulty with this approach, however, is that memory traffic caused by queue-related accesses becomes a significant problem, and consequently the potential for significant data throughput is severely diminished.
Accordingly, what is needed is an improved technique for performing node traversal operations in a treelet-composed hierarchical structure.