The invention relates to array processing systems which incorporate a large number of processors that are interconnected in a regular connection structure and in which all of the processors receive the same instruction from a common control structure.
Digital computers may be grouped into four categories which are specified in terms of the way in which the computer handles the instructions and the data. The four categories are: Single Instruction stream--Single Data stream (SISD) machines, Single Instruction stream--Multiple Data stream (SIMD) machines, Multiple Instruction stream--Single Data stream (MISD) machines, and Multiple Instruction stream--Multiple Data stream (MIMD) machines. Serial computers, that is, systems that have a single processor such as the personal computers with which many people are familiar, fall within the first category. Parallel computers, which are systems that include more than one processor, fall within one of the last three categories.
Both serial and parallel computers are capable of performing some level of parallelism. For example, by using pipelining, the SISD machines can achieve a temporal parallelism according to which the computer simultaneously processes multiple instructions of a given type from the instruction sequence. Computers that have multiple processing elements can achieve spatial parallelism in which different processing elements simultaneously process different instructions or different data using the same instructions.
The particular machine which will be described in this application is an SIMD machine. In general, SIMD machines include an array of processing elements that are supervised by a central array control unit. In some SIMD machines each processing element in the array also has its own associated local memory. In such machines, the array control unit distributes data that is to be processed among the local memories so that each memory contains a different segment of the data. Then, the array control unit sends the same sequence of instructions to each of the processing elements. Each processing element executes the sequence of instructions in synchronization with the other processing elements but does so on a different set of data using its own locally stored data.
One of the original motivations for developing SIMD machines and other parallel computers was to solve vector or matrix type problems, i.e., computations having an inherent parallel structure to them. Characteristically, such problems involve performing the same sequence of operations on many different data elements. Often the computations can be organized such that there are no dependencies between operations which are performed on different parts of the data. Thus, the computations readily lend themselves to being performed in a parallel fashion on a SIMD machine in which each processing element works on a different part of the data. Examples of several computations for which algorithms have been developed to exploit the parallelism of the SIMD machines include matrix multiplication, image processing, the fast Fourier transform (FFT), solving simultaneous partial differential equations, and parallel sorting.
Obviously, one should be able to achieve a considerable speed advantage by performing operations in parallel on multiple processing elements in this fashion. Indeed, as a first guess one might expect that as "n" the number of processing elements increases, the speed advantage should increase proportionally. In an attempt to exploit the obvious speed advantage that should result from using a large numbers of processing elements, certain companies are manufacturing SIMD machines, sometimes referred to as massively parallel processors, in which the processing elements number in the tens of thousands. However, due to practical considerations such as the inevitable memory access conflicts, inefficiencies and difficulties in fully exploiting the concurrency in computing problems and design limitations, the actual speed up is usually considerably less than a factor of n. Taking into account the inherent difficulties of fully exploiting the parallel computing capabilities of a multiple processing element system, it has been estimated that actual speed up falls somewhere within a range having a lower bound of log.sub.2 n and an upper bound of n/(ln n).