1. Field of Invention
The present invention relates to an advanced method and apparatus for parallel computing which is versatile and suitable for use in computationally demanding applications, including real-time visualization of volumetric data.
2. Brief Description of the Prior Art In the contemporary period, most digital computers are similar in that they have (i) a central processing unit for performing computational tasks such as addition, multiplication, loading registers and comparisons, and also (ii) a memory storage medium for storing data. In order to solve a particular problem by computing, human programmers first reduce the problem to a series of computational tasks, and then divide each computational task into a sequence of steps or instructions to provide a program. The central processing unit then executes the sequence of instructions one step at a time upon a data set in order to compute a solution to the formulated problem. For each computation to be performed, the appropriate data set must be retrieved from the memory and brought to the central processing unit where there it is operated upon in accordance with the program before being returned to memory. This type of computing machine design is called sequential or serial because the processing operations are performed one at a time, in a sequence or series.
One major drawback of serial computing machines is that while the central processor is kept active, most of the memory is idle during processing operations. Another major drawback is that serial computing machines are inherently slow, since during each phase of a computation, many of the components of the processor are idle.
Hitherto, the development of interleaved memory, pipelining, vector processing and very long word (VLIW) machinery has helped to increase the speed and efficiency of single processor serial computers. However, there are numerous applications in which even very fast serial computers are simply inadequate. For example, presently there are a large number of problems requiring the performance of hundreds of millions of computations per second. Such problems include, for example, simulation of atomic or particle interaction in the fields of computational physics and chemistry; simulation of gravitational interplay among celestial objects in the field of computational cosmology; simulation of biological cell processes in the field of computational biology; climate modeling and forecasting in the field of computational meteorology; air-traffic control; flight simulation; and knowledge-base searching in artificial intelligence (AI) systems. Commercially available serial computing machines have been simply too slow for such computationally demanding applications.
One solution to the problem posed by serial computing machines has been to use a parallel processing design in which many small processors are linked together to work simultaneously so that both memory capacity and processing capacity can be utilized with high efficiency. To date, a number of parallel computing machines have been constructed. In general, the character and performance of such computing machines are determined by three factors: (i) the nature, size and number of the processing elements; (ii) the nature, size and number of the memory nodes; and (iii) the strategy of interconnecting the processors and the memories.
One type of parallel computing machine in commercial use is known as a Single-Instruction-Stream-Multiple-Data-Stream (SIMD) machine. In general, a SIMD computing machine has a single control unit that broadcasts one instruction at a time to all of the processors which execute the instructions on multiple data sets simultaneously. On the basis of the performance factors set forth above, commercially available SIMD computing machines can be grouped into two distinct classes.
The first class of SIMD computing machine includes numeric supercomputers and other parallel computing machines that operate on vectors by performing the same operation on each vector element. In general, each processor in this class of SIMD computing machinery is a vector processor consisting of an array of arithmetic logic units (ALU's) particularly adapted for processing vector formatted data. Each vector processor is provided access to a common memory, and there is no interprocessor connections or other provision for the parallel vector processors to share information among themselves. A typical program run on this class of SIMD computing machine includes many statements of the form: for i=1 to n, do a[i]=b[i]+c[i] where a, b and c are vectors. In essence, this class of SIMD machine receives two n-element vectors b[i] and c[i] as input, and operates on corresponding elements in parallel using the vector ALU's to provide an n-element vector a[i] as output. The Cray-1 Supercomputer from the Cray Computer Corporation, is representative of this first class of SIMD computing machine.
The second class of SIMD computing machine includes parallel-type computing machines which facilitate coordinated communication among the parallel processors. In general, each processor in this second class of SIMD computing machines is a simple ALU which is provided access to a local memory which it controls. Each ALU can communicate with other ALU's through a communication network having either a fixed or programmable topology. In the Connection Machine computer from the Thinking Machines Corporation, 65,536 1-bit ALU's are configured as parallel processors and an interprocessor communication network having the topology of a n-cube or hypercube is provided for the transfer of data among these parallel processors. As in other SIMD computing machines, a single control unit broadcasts instructions to the 65,536 independent ALU processors. Although the ALU processors are individually small and slow, the total computation and input/output throughput of the Connection Machine computer is quite substantial because of the assembled power of its processing units and interprocessor communication system. Notably, as the Connection Machine computer has no program storage of its own, the instructions of the program must be downloaded while the program is running. Consequently, a high bandwidth is required between the host system and the Connection Machine computer, resulting in relatively long cycle times.
While the Connection Machine and Cray-1 parallel computer systems each perform well in a number of advanced parallel computing applications, they both are poorly suited for volume visualization applications. Consequently, a variety of special purpose computing machines exploiting parallelism have been built in order to perform volume visualization tasks quickly. A number of prior art 3-D graphics-display and voxel-based computer graphic systems are described in detail in Applicant's U.S. Pat. No. 4,985,856 which is incorporated herein by reference.
While many existing voxel-based systems employ parallel data transfer and processing mechanisms dedicated specifically to volume projection and rendering tasks, such capabilities are neither available for scan-conversion of geometrically represented objects nor processing of voxel-based objects without transferring data out of the 3-D memory storage device. Moreover, modification of a 3-D voxel-based object in prior art systems requires discarding the voxel-based object, creating a new geometrically represented object with desired modifications, and then scan-converting (i.e. voxelizing) the modified geometric object. This process results in great computational time and expense. Additionally, while these prior art voxel-based systems permit volume visualization of 3-D objects, only a limited number of directions are provided along which to visualize volumetrically represented data, without irreversibly modifying original voxel data.
Thus, there is a great need in the parallel computing art to provide a versatile method and apparatus for parallel computing which permits high levels of computational performance in numeric, symbolic and volume visualization applications in diverse fields of science, art and commerce.