It has been a desire for a long time and continues to be such in the computer arts to produce a computing machine which can process large amounts of data in minimum time. Typically, instructions and data are forced to flow serially through a single, and hence central, processing unit (CPU). The bit width of the processor's address/data bus (i.e., 8, 16 or 32 bits wide) and the rate at which the processor (CPU) executes instructions (often measured in millions of instructions per second, “MIPS”) tend to act as critical bottlenecks which restrict the flow rate of data and instructions. CPU execution speed and bus width must be continuously pushed to higher levels if processing time is to be reduced.
Attention is being directed to a different type of computing architecture where problems are solved not serially but rather by way of the simultaneous processing of parallel-wise available data using multiple processing units. These machines are often referred to as parallel processing arrays. The advantage of parallel processing is simple. Even though each processing unit may have a finite, and therefore speed-limiting, processor bandwidth, an array having a number of such processors will have a total computation bandwidth of a number of times the processor bandwidth.
The benefits derived from increasing the size of a parallel array are countered by a limitation in the speed at which messages can be transmitted to and through the parallel array, i.e., from one processor to another or between one processor and an external(input/output) device. Inter-processor messaging is needed so that intermediate results produced by one processing unit can be passed on to another processing unit within the array. Messaging between the array's parallel memory structure and external I/O devices such as high speed disks and graphics systems is needed so that problem data can be quickly loaded into the array and solutions can be quickly retrieved. The array's messaging bandwidth at the local level, which is the maximum rate in terms of bits per second that one randomly located processor unit can send a message to any other randomly located processor unit.
Hopefully, messaging should take place in parallel so that a multiple number, of processors are simultaneously communicating at one time thereby giving the array a parallel messaging bandwidth of multiple times the serial bandwidth. Ideally, the simultaneous communication should be equal to the number of processors in the array so the processors are simultaneously able to communicate with each other. Unfortunately, there are practical considerations which place limits on the speed and number of processors which can communicate with each other. Among these considerations are the maximum number of transistors and/or wires which can be defined on a practically-sized integrated circuit chip, the maximum number of integrated circuit's and/or wires which can be placed on a practically-sized printed circuit board and the maximum number of printed circuit boards which can be enclosed within a practically-sized card cage. Wire density is typically limited to a finite, maximum number of wires per square inch and this tends to limit the speed of processor communications in practically-sized systems.
If the ultimate goal of parallel processing is to be realized (unlimited expansion of array size with concomitant improvement in solution speed and price/performance ratio), ways must be found to maximize the parallel messaging bandwidth so that the latter factors do not become new bottlenecking limitations on the speed at which parallel machines can input problem data, exchange intermediate results within the array, and output a solution after processing is complete.