Volume rendering is a visualization technique that renders volumetric data (three-dimensional data). A voxel is an element of volumetric data. The term is a concatenation of the first syllables of “volume” and “picture element,” analogous to the two-dimensional term pixel made from “picture” and “element.” The volumetric data can be acquired from medical scanners, seismic acoustic collection devices, or data generated by algorithmic methods on a computer system. Volume rendering enables the visualization of sampled functions of three spatial dimensions by computing 2-D projections of a colored semitransparent volume. Volume rendering is distinguished from surface rendering such that a volume contains both exterior surface characteristics and internal characteristics. Currently, the major application areas for volume rendering are medical imaging and seismic interpretation.
Volume rendering can be accomplished using a variety of techniques but generally the techniques fall into two categories: direct or indirect methods. Direct methods are distinguished from indirect methods by how they interpret the data collected. Examples of a direct method is ray casting which selects voxels from the data along a ray cast from the viewing position and traversing the volume, blends them and then assigns them to a particular pixel location on the display system. Ray casting (also known as ray tracing, or volume ray casting) is a method used to render high-quality images of objects. Most ray tracing algorithms are implemented on systems with preemptive multi-tasking, virtual memory management, and multi-level cache, because as the rays traverse the data they typically jump to non-contiguous memory locations resulting in a lot of cache misses.
Indirect methods such as marching cubes look for specific values in the volumetric data and then try to extract a set of voxels with those values. If the exact values do not exist in the data, the algorithm looks for voxels with values that are less than and greater than the requested value and interpolates between both voxel points to compute a new point for the requested value. The points are then connected to form a set of triangles which are sorted based on distance from the viewer then rendered to the computer display as filled surfaces. This is also known as isosurface extraction. These two methods differ not only in how they examine the volumetric data, but in how they render the volumetric data. Both methods result in various performance impacts such as cache misses, page faults, and task switching.
Each category of volume rendering techniques has its own advantages and disadvantages. Because direct methods deal with the entire volume at once they can include all the different structures that are represented by the volumetric data when creating the rendered view for display. Direct methods also allow the viewer to highlight different features within the volumetric data, without totally eliminating the rendering of other features, by altering how the voxels are blended together during the rendering process. For example, a transparent or translucent structure can be viewed at the same time as the structure within the transparent structure is highlighted. The functions that control how voxels are blended together are called transfer functions. Changing these functions may result in visual effects such as highlighting a blood vessel instead of muscle tissue. Finally, with direct volume rendering the rendered data has an “interior,” i.e. if you pass a cut plane through it there is rendered data in the newly exposed surface.
In contrast to direct methods, indirect methods render only a surface, i.e. internal data is lost. A serious drawback is that only one surface at a time can be extracted. For example, if one wishes to highlight a blood vessel or other tissue, one must first revert to the original data and extract an entirely different surface—this is an expensive process. The one advantage that indirect methods have enjoyed is that they produce a surface that can be used for collision detection. This is particularly useful in virtual endoscopic procedures, for example, virtual colonoscopy. Even so, direct methods do allow path navigation along centerlines, even though there is no explicit surface with which to detect a collision.
Direct methods are predominant today and among direct methods, ray casting is by far the most popular and advantageous, but also the most computationally intensive. Ray casting of volumetric data entails casting a ray (i.e., a directional line) from the eye (through a camera lens) through a projection plane depicted by a two-dimensional display and following the directional line through a virtual three-dimensional grid encompassing the volume in the scene. A parametric equation representing the ray from the eye or the user's viewpoint is mathematically represented by the computer application. The application can also mathematically represent the six sides of a box encompassing the volumetric data (the bounding box). The application then computes the intersection of the ray with the sides of the bounding box.
With the emergence of multi-core computer architectures, microprocessor performance has increased to a point where ray casting can practically be employed for real-time rendering of volumetric data. Multi-core processors based on industry standard micro architectures typically have support for virtual memory management, hardware memory caches, and pre-emptive multitasking. In this case, access to volumetric data by the arithmetic logic units of the processor is managed as part of the hardware support that is in place to support the memory hierarchy. This implies that computations are performed on data that has been loaded into very fast register memory; data not in very fast register memory is ideally in fast cache memory; data not in fast cache memory is in slower system memory; and finally data not in system memory is on disk or some other permanent memory storage medium.
It is noted that there may be several levels of cache each being larger and slower than the preceding level. Modern computer systems hardware will check level 1 first for data, then level 2, then level 3. If data is not found in any of these levels, it must be loaded from system memory. This is an expensive operation because the time latency of moving data from system memory to L3, L2, and L1 cache can be many clock cycles. During this time, the microprocessor arithmetic unit can sit idle, wasting computer resources and degrading application performance.
The process of loading volumetric data into registers where the arithmetic logic units (ALU) can perform operations on the volumetric data is essentially handled by the hardware. The problem is that if data is not in register memory it must be fetched from cache. If it is not in cache it must be loaded into cache from main memory and if it is not in main memory it must be loaded from disk. This multi-level memory hierarchy makes it difficult to control performance as tracing a ray from the eye through the volume results in many accesses to non-contiguous locations in memory and hence increased probability that data will not be in one of the aforementioned memory locations.
In non-traditional architectures such as the Cell/BE (Cell Broadband Engine Architecture) or Graphics Processing Units (GPU), there is less sophisticated hardware support for multi-level memory hierarchy. The programmer must manage the movement of data from one level of memory to another. The advantage is predictability in response and overall performance, but the disadvantage is that more programming is required to explicitly manage data movement. This tradeoff is being made by today's microprocessor architects because the next generation of applications is dependent on real-time, predictable responses. Additionally, the overall performance of the next generation of microprocessors can be increased substantially by providing less hardware support to manage memory and more to manage computation. Even so, the programmability must be addressed. This same set of circumstances applies to commodity graphics processing units (GPUs) being promoted by AMD, Intel, and nVidia for compute intensive applications such as volume rendering.
In the case of direct volume rendering using ray casting, the algorithm that is used normally accesses data (voxels) from non-contiguous memory locations. In traditional hardware the application would just request the data and the hardware would automatically fetch it from one memory location or another. In the case of GPUs (graphics processing unit), DSPs (digital signal processors), or Cell/BE there is no hardware support for this. To further complicate programmability, memory close to the computation units is usually very small, i.e. too small to store an entire volume of data. The algorithm and programmer must address this in an intelligent manner to achieve the highest performance.
One solution to this problem is to access the data on demand. Essentially, as the ray or rays cast from the eye are extended through the volume (for all 0≦t≦1), the program computes the address in memory of the next required data location and fetches it from memory. This will definitely result in many small accesses to non-contiguous memory locations and the overhead of these accesses, in the form of cache miss latency, will likely impact overall volume rendering performance. Additionally, the computational units will be idle for a greater percentage of the time as they wait for the next small piece of data, therefore processor utilization will be lower than ideal.
Another solution includes selecting a larger region of the volume data surrounding the x, y, z position of the ray at any point in time t to load into memory and working with that region. It is highly likely that elements of a particular region may be needed to process subsequently cast rays. In this case, a software caching scheme might be employed to keep frequently used data closer to the arithmetic logic units (ALUs). This approach provides some of the advantages of a traditional microprocessor, but has several key disadvantages that make it less than an ideal solution:
(1) Some portion of fast access memory must be reserved to cache data.
(2) Some portion of fast access memory must be used to store code that manages this software cache.
(3) Access to data is not immediate, i.e. a caching scheme and the logic that supports it must be invoked. Ultimately this will lead to performance challenges, e.g. if data is not in software cache, a cache miss will be issued and additional logic is used to go fetch the data from the next level of memory in the hierarchy.