The present invention is related to the field of computer graphics, and in particular to rendering volumetric data sets using hardware pipelines.
Volume graphics is the subfield of computer graphics that deals with the visualization of objects or phenomena represented as sampled data in three or more dimensions. These samples are called volume elements, or xe2x80x9cvoxels,xe2x80x9d and contain digital information representing physical characteristics of the objects or phenomena being studied. For example, voxel values for a particular object or system may represent density, type of material, temperature, velocity, or some other property at discrete points in space throughout the interior and in the vicinity of that object or system.
Volume rendering is the part of volume graphics concerned with the projection of volume data as two-dimensional images for purposes of printing, display on computer terminals, and other forms of visualization. By assigning colors and transparency to particular voxel data values, different views of the exterior and interior of an object or system can be displayed. For example, a surgeon needing to examine the ligaments, tendons, and bones of a human knee in preparation for surgery can utilize a tomographic scan of the knee and cause voxel data values corresponding to blood, skin, and muscle to appear to be completely transparent. The resulting image then reveals the condition of the ligaments, tendons, bones, etc. which are hidden from view prior to surgery, thereby allowing for better surgical planning, shorter surgical operations, less surgical exploration and faster recoveries. In another example, a mechanic using a tomographic scan of a turbine blade or welded joint in a jet engine can cause voxel data values representing solid metal to appear to be transparent while causing those representing air to be opaque. This allows the viewing of internal flaws in the metal that would otherwise be hidden from the human eye.
Real-time volume rendering is the projection and display of volume data as a series of images in rapid succession, typically at 30 frames per second or faster. This makes it possible to create the appearance of moving pictures of the object, phenomenon, or system of interest. It also enables a human operator to interactively control the parameters of the projection and to manipulate the image, thus providing the user with immediate visual feedback. It will be appreciated that projecting tens of millions or hundreds of millions of voxel values to an image requires enormous amounts of computing power. Doing so in real time requires substantially more computational power.
Additional general background on volume rendering is presented in a book entitled xe2x80x9cIntroduction to Volume Renderingxe2x80x9d by Barthold Lichtenbelt, Randy Crane, and Shaz Naqvi, published in 1998 by Prentice Hall PTR of Upper Saddle River, N.J. Further background on volume rendering architectures is found in a paper entitled xe2x80x9cTowards a Scalable Architecture for Real-time Volume Renderingxe2x80x9d presented by H. Pfister, A. Kaufman, and T. Wessels at the 10th Eurographics Workshop on Graphics Hardware at Masstricht, The Netherlands, on Aug. 28 and 29, 1995. This paper describes an architecture now known as xe2x80x9cCube 4.xe2x80x9d The Cube 4 is also described in a Doctoral Dissertation entitled xe2x80x9cArchitectures for Real-Time Volume Renderingxe2x80x9d submitted by Hanspeter Pfister to the Department of Computer Science at the State University of New York at Stony Brook in December 1996, and in U.S. Pat. No. 5,594,842, xe2x80x9cApparatus and Method for Real-time Volume Visualization.xe2x80x9d
Cube 4 and other architectures achieve real-time volume rendering using the technique of parallel processing. A plurality of processing elements are deployed to concurrently perform volume rendering operations on different portions of a volume data set, so that the overall time required to render the volume is reduced in substantial proportion to the number of processing elements. In addition to requiring a plurality of processing elements, parallel processing of volume data requires a high-speed interface between the processing elements and a memory storing the volume data, so that the voxels can be retrieved from the memory and supplied to the processing elements at a sufficiently high data rate to enable the real-time rendering to be achieved.
Volume rendering as performed by Cube 4 is an example of a technique known as xe2x80x9cray-casting.xe2x80x9d A large number of rays are passed through a volume in parallel and processed by evaluating the volume data a slice at a time, where a xe2x80x9cslicexe2x80x9d is a planar set, of voxels parallel to a face of the volume data set. Using fast slice-processing technique in specialized hardware, as opposed to software, frame processing rates can be increased to be higher than two frames per second.
The essence of the Cube-4 system is that the three dimensional sampled data representing the object is distributed across the memory modules by a technique called xe2x80x9cskewing,xe2x80x9d so that adjacent voxels in each dimension are stored in adjacent memory modules independent of view direction. Each memory module is dedicated to its own processing pipeline. Moreover, voxels are organized in the memory modules so that if there are a total of P pipelines and P memory modules, then P adjacent voxels can be fetched in parallel within a single clock cycle of a computer memory system, independent of the view direction. This reduces the total time to fetch voxels from memory by a factor of P. For example, if the data set has 2563 voxels and P has the value four, then only 2563/4 or approximately four million memory cycles are needed to fetch the data in order to render an image.
An additional characteristic of the Cube-4 system is that the computational processing required for volume rendering is organized into pipelines with specialized functions for this purpose. Each pipeline is capable of starting the processing of a new voxel in each cycle. Thus, in the first cycle, the pipeline fetches a voxel from its associated memory module and performs the first step of processing. Then in the second cycle, the pipeline performs the second step of processing of this first voxel, while at the same time fetching the second voxel and performing the first step of processing this voxel. Likewise, in the third cycle, the pipeline performs the third processing step of the first voxel, the second processing step of the second voxel, and the first processing step of the third voxel. In this manner, voxels from each memory module progress through its corresponding pipeline in lock-step fashion, one after the another, until all voxels are fully processed. Thus, instead of requiring 10 to 100 software instructions per voxel, a new voxel can be processed in every clock cycle.
Skewing can disperse adjacent voxels over any of the pipelines, and since the pipelines are dedicated to memory. modules, the Cube-4 system must communicate voxel data with four other pipelines, i.e., the two neighboring pipelines on either side. Such communication is required, for example, to transmit voxel values from one pipeline to another for purposes such as estimating gradients or normal vectors so that lighting and shadow effects can be calculated. Pipeline interconnects are used to communicate the values of rays as they pass through the volume accumulating visual characteristics of the voxels in the vicinities of the areas through which they pass. Having, a large number of interconnects among the pipelines increases the complexity of the system.
In the Cube-4 system, volume rendering proceeds as follows. Data are organized as a cube or other parallelepiped data structure. Considering first the face of this cube or solid that is most nearly perpendicular to the view direction, a partial beam of P voxels at the top corner is fetched from P memory modules concurrently, in one memory cycle, and inserted into the first stage of the P processing pipelines. In the second cycle these voxels are moved to the second stage of their respective pipelines. At the same time, the next P voxels are fetched from the same beam and inserted into the first stage of their pipelines. In each subsequent cycle, P more voxels are fetched from the top beam and inserted into their pipelines, while previously fetched voxels move to later stages of their pipelines,. This continues until the entire beam of voxels has been processed. In the terminology of the Cube-4 system, a row of voxels is called a xe2x80x9cbeamxe2x80x9d and a group of P voxels within a beam is called a xe2x80x9cpartial beam.xe2x80x9d
After the groups of voxels in a beam have been processed, the voxels of the next beam are processed, and so on, until all of the beams of the face of the volume date set have been fetched and inserted into their processing pipelines. This face is called a xe2x80x9cslice.xe2x80x9d Then, the Cube-4 system moves again to the top corner, but this time starts fetching the P voxels in the top beam immediately behind the face, that is from the second xe2x80x9cslice.xe2x80x9d In this way, it progresses through the second slice of the data set, a beam at a time and within each beam, P voxels at time. After completing the second slice, it proceeds to the third slice, then to subsequent slices in a similar manner, until all slices have been processed. The purpose of this approach is to fetch and process all of the voxels in an orderly way, P voxels at a time, until the entire volume data set has been processed and an image has been rendered.
The processing stages of the Cube-4 system perform all of the calculations required for the ray-casting technique, including interpolation of samples, estimation of the gradients or normal vectors, assignments of colors and transparency or opacity, and calculation of lighting and shadow effects to produce the final image on the two dimensional view surface.
The Cube-4 system is designed to be capable of being implemented in semiconductor technology. However, two limiting factors prevent Cube-4 from achieving the small size and low cost necessary for personal or desktop-size computers, namely the rate of accessing voxel values from memory modules, and the amount of internal storage required in each processing pipeline. With regard to the rate of accessing memory, the method of skewing voxel data across memory modules in Cube-4 leads to inefficient patterns of accessing voxel memory that are a slow as random accesses. Therefore, in order to achieve real-time volume rendering performance, voxel memory in a practical implementation of Cube-4 must either comprise very expensive static random access memory (SRAM) modules or a very large number of independent Dynamic Random Access Memory (DRAM) modules to provide adequate access rates. With regard to the internal storage, the Cube-4 algorithm requires that each processing pipeline stores intermediate results within itself during processing, the amount of storage being proportional to the area of the face of the volume data set being rendered. For a 2563 data set, this amount turns out to be so large that the size of a single chip processing pipeline is excessive, and therefore impractical for a personal computer system.
In order to make real-time volume rendering practical for personal and desktop computers, an improvement upon the Cube-4 system referred to as xe2x80x9cEM Cubexe2x80x9d employs techniques including architecture modifications to permit the use of high capacity, low cost Dynamic Random Access Memory or DRAM devices for memory modules. The EM Cube system is described in U.S. patent application Ser. No. 08/905,238, filed Aug. 1, 1997, entitled xe2x80x9cReal-Time PC Based Volume Rendering Systemxe2x80x9d, and is further described in a paper by R. Osborne, H. Pfister, et al. entitled xe2x80x9cEM-Cube: An Architecture for Low-Cost Real-Time Volume Rendering,xe2x80x9d published in the Proceedings of the 1997 SIGGraph/Eurographics Workshop on Graphics Hardware, Los Angeles, California, on Aug. 3-4, 1997.
The EM-Cube system utilizes DRAM chips that support xe2x80x9cburst modexe2x80x9d access to achieve both low cost and high access rates to voxel memory. In order to exploit the burst mode, EM Cube incorporates architectural modifications that are departures from the Cube-4 system. In a first modification, called xe2x80x9cblocking,xe2x80x9d voxel data are grouped into blocks, independent of a view direction, so that all voxels within a block are stored at consecutive memory addresses within a single memory module. Each processing pipeline fetches an entire block of neighboring voxels in a burst rather than one voxel at a time. In this way, a single processing pipeline can access memory at data rates of 125 million or more voxels per second, thus making it possible for four processing pipelines and four DRAM modules to render 2563 data sets at 30 frames per second.
In EM Cube, each block is processed in its entirety within the associated processing pipeline. EM Cube employs an inter-chip communication scheme to enable each pipeline to communicate intermediate values to neighboring pipelines as required. For example, when a pipeline in EM Cube encounters either the right, bottom or rear face of a block, it is necessary to transmit partially accumulated rays and other intermediate values to the pipeline that is responsible for processing the next block located on the other side of the respective face. Significant inter-chip communication bandwidth is required to transmit these intermediate values to any other pipeline. However, the amount of inter-chip communication is reduced by blocking.
Like Cube 4, the EM Cube architecture is designed to be scalable, so that the same basic building blocks can be used to build systems with significantly different cost and performance characteristics. In particular, the above-described block processing technique and inter-chip communication structure of EM Cube are designed such that systems using different numbers of chips and processing pipelines can be implemented. Thus, block-oriented processing and high-bandwidth inter-chip communication help EM Cube to achieve its goals of real-time performance and scalability. It will be appreciated, however, that these features also have attendant costs, notably the cost of providing area within each processing pipeline for block storage buffers and also the costs of chip I/O pins and circuit board area needed to effect the inter-chip communication.
In a second modification to the Cube-4 architecture, EM Cube also employs a technique called xe2x80x9csectioningxe2x80x9d in conjunction with blocking in order to reduce the amount of on-chip buffer storage required for rendering.
In this technique, the volume data set is subdivided into sections and rendered a section at a time. Partially accumulated rays and other intermediate values are stored in off-chip memory across section boundaries. Because each section presents a face with a smaller area to the rendering pipeline, less internal storage is required. The effect of that technique is to reduce the amount of intermediate storage in a processing pipeline to an acceptable level for semiconductor implementation.
Sectioning in EM Cube is an extension of the basic block-oriented processing scheme and is supported by some of the same circuitry required for the communication of intermediate values necessitated by the block processing architecture. However, sectioning in EM Cube results in very bursty demands upon off-chip memory modules in which partially accumulated rays and other intermediate values are stored. That is, intermediate data are read and written at very high data rates when voxels near a section boundary are being processed, while at other times no intermediate data are being read from or written to the off-chip memory. In EM Cube it is sensible to minimize the amount of intermediate data stored in these off-chip memory modules in order to minimize the peak data rate to and from the off-chip memory when processing near a section boundary. Thus in EM Cube many of the required intermediate values are re-generated within the processing pipelines rather than being stored in and retrieved from the off-chip memory modules. During the processing carried out in each section near the boundary with the preceding section, voxels from the preceding section are re-read and partially processed in order to re-establish the intermediate values in the processing pipeline that are required for calculation in the new section.
While the EM Cube system achieves greater cost effectiveness than the prior Cube 4 system, it would be desirable to further lower costs to enable more widespread enjoyment of the benefits of volume rendering. Further, it would be desirable to achieve such cost reductions while retaining real-time performance levels. It would also be desirable to achieve rendering performance of 2563 voxels at 24 frames per second, or better, with a single integrated semiconductor chip.
The invention provides a volume rendering integrated circuit including a plurality of interconnected pipelines. Each identical pipeline includes multiple different rendering stages. In one embodiment, the stages of the pipelines are interconnected in a ring, with data being passed in only one direction around the ring to one immediate adjacent neighboring pipeline. The volume rendering integrated circuit also includes a render controller for controlling the flow of volume data to and from the pipelines and for controlling the various rendering operations of the pipelines. The integrated circuit may further include interfaces for coupling the integrated circuit to various storage devices and to a host computer. According to one aspect of the invention, a volume rendering graphics device renders a volume data set arranged as an array of voxels. The device includes a plurality of pipelines. The pipelines operate in parallel. The plurality of pipelines are coupled in a ring, and each one of the plurality of pipelines forwards data to only one other neighboring pipeline in the ring.
According to another aspect of the invention, a volume graphics integrated circuit includes a plurality of pipelines connected to a host device. A memory interface couples the plurality of pipelines to a first storage device storing a volume data set. A pixel interface couples the plurality of pipelines to a second storage device, the second storage device for storing pixel data representative of one view of the volume data set stored in the first storage device. A section interface couples the plurality of pipelines to a third storage device, the third storage device for storing rendering data associated with at least a section of the portion of the volume data set.