In a SIMD computer, such as the Connection Machine (Reg. T.M. of Thinking Machines Corporation, Cambridge, MA) computer system, the architecture is designed to support a data parallel style of programming. In this style one programs assuming a separate processor for every data element, so that one may effectively operate on all data elements in parallel.
The Connection Machine computer system supports such a style of programming by providing tens of thousands of individual hardware data processors, each with its own memory for holding a data element. (Current standard Connection Machine system configurations provide 16,384 processors and 65,536 processors.) The data processors all process instructions issued on a centrally controlled instruction bus, so that at any given time all processors (or all processors in a large group) are executing the same instruction. The instruction bus is driven by a front end computer, which is a conventional single-processor computer such as a Symbolics 3600 computer or a Digital Equipment Corporation VAX Computer.
For example, if an ADD instruction is issued, then all processors perform addition, each on its own data. (Most instructions are conditional, so that a flag bit in each processor becomes an additional implicit input to the operation, and the operation's results are stored only in processors whose flag bit is 1.) Many of the usual arithmetic and logic instructions found in contemporary computer instructions sets (such as SUBTRACT, MULTIPLY, DIVIDE, MAX, MIN, COMPARE, LOGICAL AND, LOGICAL OR, LOGICAL EXCLUSIVE OR, and floating-point instructions) are provided in this form; when one such instruction is issued, it is performed (possibly conditionally) by every hardware processor, each on its own data.
Other computer systems of this general style have also been built. Prominent among these are the ICL DAP and the Goodyear MPP. A typical difficulty with these computer systems is that programming becomes much more complicated if the number of data elements in the problem to be solved exceeds the number of hardware processors. The Goodyear MPP, for example, provides 16,384 hardware processors configured in a 128.times.128 two-dimensional grid. If a problem requires the processing of 200.times.200 elements (total 40,000), the programming task is much more difficult because one can no longer assign one data element to each processor, but must assign two data elements to some processors. Even if a problem requires no more than 16,384 data elements, if they are to be organized as a 64.times.256 grid rather than a 128.times.128 pattern, programming is again complicated, this time because the problem communication structure does not match the hardware communication structure.