This invention is in the field of computer architecture, and, more particularly it relates to parallel processing and multiprocessor computer systems.
Current approaches to parallel processing include Single Instruction Multiple Data (SIMD) machines, parallel processors, systolic arrays, and data flow machines. SIMD machines use a single sequential instruction stream to control parallel arithmetic units. Multiple Instruction Multiple Data (MIMD) computer have a large number of sequential processors which are connected to memory elements through a routing network. Systolic arrays such as WARP and iWARP pass data through a communications network in such a way that operands simultaneously arrive at a processing element. Data flow machines pass data tokens to processing elements which "fire" after all operands arrive.
In addition to the many possible architectures for parallel processing machines using known discrete microprocessors, developments in Integrated Circuit (IC) fabrication have created new possibilities for multiprocessor computer systems. The ever increasing density of Very Large Scale Integration (VLSI) components offers new challenges to computer architects to find optimal use for the added silicon area. Today it is feasible to include a processor, floating point arithmetic unit, and small, first-level cache on a single IC. Higher levels of integration have greatly improved single processor performance, largely due to the elimination of inter-circuit signal delays in the critical paths to the memory caches.
As circuit densities continue to increase, the next steps needed to improve performance are not clear. Increasing the size of the on-chip caches will improve performance but this "solution" soon reaches a point of diminishing returns. Another approach is to include multiple function units and to allow multiple instructions to be dispatched simultaneously. These "superscalar" designs also improve performance, but the techniques used do not scale beyond a few instruction dispatches per cycle.
The advances of Wafer Scale Integration (WSI) will add even more pressure to find creative uses for the vast increases in available circuit area. It is likely that the next major advance in processing speed will result from the incorporation of multiple parallel processors on the same silicon device. Many current approaches to parallel processing define the problem in terms of selecting an optimal interconnection network for multiple high-performance microprocessors.
Although much work has been done on parallel processors and their interconnection networks, the problems with known approaches include the complexity of the interconnection network and the necessity of frequent and time-consuming memory access. A parallel processing architecture which could utilize a relatively simple interconnection network and reduce the number of needed main memory accesses would be a significant advance in parallel computing.