This invention relates to volume rendering and more particularly to a memory architecture which permits real-time volume rendering through the rapid read out of memory using minimum size data blocks for storing volume data sets.
Volume rendering is part of volume graphics, the subfield of computer graphics that deals with the visualization of objects or phenomena represented as sampled data in three or more dimensions. These samples are called volume elements, or "voxels," and contain digital information representing physical characteristics of the objects or phenomena being studied. Volume rendering is the area of volume graphics concerned with the projection of volume data as two-dimensional images for purposes of printing, display on computer terminals, and other forms of visualization. Real-time volume rendering is the projection and display of volume data as a series of images in rapid succession, typically at 30 frames per second or faster, thereby making it possible for a human operator to interactively control the parameters of the projection and to manipulate the image, while providing immediate visual feedback.
While software methods for volume rendering have been practiced for ten to twenty years, they have not been usable for real-time volume rendering, both because of the enormous amount of computing power required and because of the difficulty of reading and moving voxel data fast enough. Even the rapid increase in the power of modern personal computers is unlikely to be enough to support real-time volume rendering in software for many years to come. For example, to render a volume data set with 256 voxels on each edge, that is a total of 256.sup.3 or more than 16 million voxels, and to do so in real-time, it is necessary to read and process all 16 million voxels at 30 or more times per second. This amounts to reading and processing rate of more than 500 million voxels per second, a rate far exceeding the computing power and memory bandwidth available in a modern personal computer. It will be appreciated that a volume data set of 512.sup.3 voxels requires a reading and processing rate eight times larger or approximately 4 billion voxels per second, and a volume data set of 1024.sup.3 voxels requires a reading and processing rate an additional eight times larger again or approximately 32 billion voxels per second. Even by using established software techniques for reducing the number of voxels processed in each frame, the rate still exceeds the memory bandwidth and computing power of a modern personal computer.
However, modern semiconductor technology makes it possible to build a special purpose volume rendering system, for example as an accessory to a personal computer by way of a plug-in circuit board. In such a system, voxel data is stored in a plurality of Dynamic Random Access Memory modules, also called DRAM chips. The data is read and processed by one or more parallel, pipelined processing elements to project images at real-time frame rates. One of the challenges in such a special purpose system is to read the voxel data out of memory fast enough. This exceeds bandwidth of all but the fastest DRAM chips operating in burst mode, that is in a mode of reading a series of data values stored at adjacent memory addresses in rapid succession. Even in this case, it is necessary to maximize the efficiency of memory to nearly 100%, that is, to operate burst mode DRAM chips at nearly 100% of their rated bandwidth.
In U.S. patent application Ser. No. 08/905,238, filed Aug. 1, 1997 and incorporated herein by reference, a real time volume rendering system is described in which voxel data is organized into blocks so that all voxels within a block are stored a single memory module at adjacent memory addresses. This makes it possible to fetch an entire block of data in a burst rather than one voxel at a time, thereby taking advantage of the burst mode access associated with DRAM. Once a block of voxels has been fetched, the voxels are passed to one or more processing pipelines at the rate of one voxel per cycle per pipeline. Meanwhile, the fetching of a subsequent block of voxels begins. A typical high-performance DRAM chip is capable of being operated at rates of 133 million, 147 million, or 166 million data elements per second, corresponding to cycle times of 7.5 nanoseconds, 7 nanoseconds, and 6 nanoseconds, respectively. If each voxel value comprises one DRAM data element, then approximately four DRAM chips are needed to operate in parallel in order to achieve the necessary data rate of 500 million voxels per second.
It will be appreciated that the order of reading blocks of voxel data depends upon the direction of viewing a volume data set, that is the position of the image plane with respect to the volume data set. In order to achieve the necessary voxel reading and processing rate for any viewing direction, it is necessary to distribute voxel data across the DRAM chips of a real-time volume rendering system so that there are no conflicts in the parallel operation of the DRAM chips. This is achieved by the method of "skewing" voxel data as implemented in a system called Cube-4, described in a Doctoral Dissertation entitled "Architectures for real-time Volume Rendering" submitted by Hanspeter Pfister to the Department of Computer Science at the State University of New York at Stony Brook in December 1996, and further described in U.S. Pat. No. #5,594,842, "Apparatus and Method for Real-time Volume Visualization." This method of skewing has been improved and adapted to a memory organization of blocks of voxels in a system called EM-Cube, as described in U.S. patent application Ser. No. 08/905,238, cited above.
The essence of the skewing of the Cube-4 system is that adjacent voxels are stored in different DRAM chips. This is true in all three dimensions, so that it is possible to concurrently fetch any group of adjacent voxels aligned with any axis of the volume data set from the same number of DRAM chips. This maximizes the efficiency of using DRAM chips in parallel, but it inefficiently utilizes the bandwidth of each DRAM chip. The essence of the EM-Cube system is that adjacent blocks of voxels are stored in adjacent DRAM chips, rather than individual voxels. This improves the efficiency of the bandwidth of each DRAM chip, but the amount of the improvement depends upon the size of the blocks, because of the way DRAM chips are organized into banks.
In particular, a modern DRAM chip comprises a plurality of banks of memory, each bank comprising a plurality of rows, and each row comprising a plurality of data elements at consecutive memory addresses. Such a DRAM chip can sustain its maximum rated bandwidth while reading or writing data within a single row of a single bank. At the same time, a row of a different bank can be "pre-charged" or prepared for transfer, so that reading or writing can continue without interruption from the previous row of the previous bank to the new row of the new bank. However, a DRAM chip cannot support the reading of or writing to two different rows of the same bank in quick succession. That is, it is impossible to pre-charge one row of a bank while reading from or writing to a different row of that same bank. Some DRAM chips impose additional constraints, for example, prohibiting the pre-charging of banks adjacent to the one that is active.
Whenever data is organized so that reading or writing to different rows of the same or conflicting banks is required, a delay of several cycles is imposed. In a real-time volume rendering system, the impact of this delay depends upon the size of the blocks. In an embodiment of the EM-Cube system, for example, blocks are 8.times.8.times.8 voxels or a total of 512 voxels. In this case, using a DRAM with a delay of eight cycles between rows of the same bank, it is still possible to read voxel data from a DRAM chip at approximately 97% efficiency. However, in a different embodiment having smaller blocks of 2.times.2.times.2 voxels or a total of eight voxels, the efficiency of the DRAM bandwidth would be reduced to approximately 50%. The challenge, then, for a real-time volume rendering system is to organize data to maximize the efficiency of DRAM memory, either by keeping blocks large enough or by avoiding accesses to the different rows of the same or conflicting banks in rapid succession.
While the prior system noted above utilized relatively large blocks of voxel data in order to maximize communication efficiency, it has now become desirable to implement the volume rendering system on a single integrated circuit or chip. However, in order to achieve real-time volume rendering performance, a change in the underlying architecture of the storage and distribution of voxel is required.