1. Field of the Invention
This invention relates to data transfers systems for massively parallel processors, and more specifically to data addressing and the transfer of data between a number of SIMD parallel processors arranged in a cluster and a common cluster memory.
2. Background of the Invention
Parallel processors have been developed that are based on the concurrent execution of the same instruction by a large number of relatively simple "processor elements" operating on respective data streams. These processors, known as single instruction, multiple data ("SIMD") processors, are useful in such applications as image processing, signal processing, artificial intelligence, data base operations, and simulations.
Typically, a SIMD processor includes an array of processor elements and a routing network over which the results of calculations and operandi are communicated among the processor elements and input/output ("I/O") devices. The operations of the processor elements and of the routing network are controlled by a separate control processor, in response to instructions and data furnished from a computer subsystem.
A recent SIMD processor is described in U.S. Pat. No. 4,314,349, issued Feb. 2, 1982 to Batcher. A processing element constitutes the basic building block, and each processing element is connected to a uniquely associated random access memory by a bi-directional data bus. The data bus is the main data path for the processing element. During each machine cycle, one bit of data can be transferred from any one of six sources, viz. a bit read from RAM memory, the state of the B, C, P, or S register, or the state of an equivalence function. The destination of a data bit on the data bus may be one or more of, for example, the following: the address location in the RAM memory, the A, G, or S register, the logic associated with the P register, the input to a sum-OR tree, and the input to the parity tree. It will be appreciated that during a memory I/O operation, the bus is reserved for memory data and other operations requiring bus access may not proceed.
The SIMD processor described in U.S. Pat. No. 4,805,173, issued Feb. 14, 1989 to Hillis et al. includes an error control and correction technique which operates across multiple processors and multiple computer memories. Each integrated circuit includes sixteen processors which are connected to an associated memory through a memory interface. The memory is in the form of 22 4K.times.1 bit RAMs. Each of 16 4K.times.1 slices functions as the memory for a different one of the 16 processors, and the remaining 6 4K.times.1 bit slices store parity or syndrome bits for the data stored in the memories of the 16 processors. Parallel data is read from or written to each integrated circuit at the address specified by an address decoder. In a memory read operation, the memory is read in parallel one row at a time to produce data outputs on 16 output lines and parity outputs on 6 additional output lines. These signals then are applied in parallel to error control circuitry for detection and correction of parity errors. In a memory write operation, data is written in parallel from the 16 processors into the 16 memory slices at the given address, and the 6 syndrome outputs are written into the 6 other memory slices at the same addresses and at the same time as the data used in generating the syndrome outputs are stored in the 16 memory slices. It will be appreciated that the error control and correction technique requires that all 16 memory slices be read or written in parallel.
A SIMD processor related to the Hillis et al. '173 processor appears in U.S. Pat. No. 4,791,641, issued Dec. 13, 1988 to Hillis. The error correction system of the Hillis patent treats data for plural memories associated with plural processors as a unitary data word, and generates an error code for the unitary data word. It will be appreciated that the error correction system contemplates that the unitary data word be read or written as a unit.
In parallel processor systems, the size of the memory per processor element tends to be small. Nonetheless, because of the great number of processor elements, the amount of memory required by a parallel processor is large. Unfortunately, memory such as SRAM with speeds comparable to that of even simple microprocessors tends to be expensive. Unfortunately, the less expensive memory such as DRAM is relatively slow and its use in a parallel processor would be expected to compromise performance by causing the processor elements to idle while the memory operation is completed.