The present invention relates to parallel processing computer systems and in particular to systems employing an array of processor elements using an SIMD architecture. One example of such a system is described and claimed in U.K Patent GB-A-1445714 and corresponding U.S. Pat. No. 3,979,728.
A number of different approaches have been adopted to increasing the power of such systems. It is possible to increase the power of individual processor elements by, for example, increasing the width of the operands handled by the ALU or increasing the range of functions offered by each ALU. Such approaches suffer the disadvantage of tending to lose the flexibility in operation and high levels of integration which are the primary design goals of SIMD systems. They also offer poor compatibility with existing systems based on simple single bit processor elements. In practice using processor elements modified in this manner to replace a conventional array requires that the operating systems for the array be rewritten from scratch and this provides a major disincentive to the upgrading of the array.
An alternative approach is to supplement an existing array with a co-processor. This has the advantage of providing greater backwards compatibility but has the disadvantage of tending to require software which is less integrated in nature, requiring entire routines to be implemented either in the main array or in the co-processor.
With single bit processors it is necessary, within an arithmetic operation such as floating point add, repeatedly to access bits of the operands and intermediate results in memory. One approach to increasing the memory bandwidth would be to provide for each processor several independent paths to memory. This would imply either more than one memory per processor, or multiple ports into a single memory. Either of these is more costly than providing just a single memory path. The cost is even higher if the processors are on a chip and the memory is external to the chip, thus requiring more data pins for the chip. It is known, in a convolution processing system for handling image data, to connect the output of an arithmetic unit to a shift register. A selected output of the shift register is then connected to the input of a neighbouring arithmetic unit. Such an arrangement is disclosed in U.K. Patent GB-A-2180968 and its corresponding U.S. Pat. No. 4,907,182.