A parallel processor based on the concurrent execution of the same global instruction by a large number of relatively simple processing elements ("PEs") is conventionally referred to as a single-instruction multiple-data machine. A SIMD processor is useful in applications such as image processing, artificial intelligence, data-base operations, matrix operations, and simulations.
In pure SIMD, each instruction is executed exactly the same on the data in each PE. However, in many applications, it is desirable that certain instructions be executed differently in different PEs depending on information, such as control or flag bits, supplied to or produced in the PEs. This is commonly referred to as "local control".
Batcher, U.S. Pat. No. 4,314,349, discloses a prior art massively parallel SIMD processor system having a primitive local control capability. The processor system in Batcher contains an array of 16,384 PEs, an array control unit ("ACU") for controlling the PEs, and an interconnection network that enables the PEs to communicate with one another. Each PE consists of an individual processor and a local memory. The ACU furnishes instructions for the PEs to execute in parallel on respective data streams.
Local control in each PE in Batcher is performed with a single control bit stored in a dedicated one-bit local control register. The value of the local control bit can differ from PE to PE--i.e., the control bit can be a logical "1" (hereafter simply "1") in some PEs and a logical "0" (hereafter simply "0") in other PEs. The control bit, in combination with a pair of global override signals supplied from the ACU, controls the clocking of certain registers in each PE.
More specifically, the control bit and one of the global override signals drive a NAND gate that controls the clocking of registers in the arithmetic sub-unit of the PE's processor. When the global override signal for the arithmetic sub-unit is "0" the registers in the arithmetic sub-unit are clocked only if the control bit is "1". If the control bit is "0" the registers in the arithmetic sub-unit are not clocked. This prevents data from being loaded into those registers and thereby disables the arithmetic sub-unit. When the global override signal for the arithmetic sub-unit is "1" all of its registers are clocked regardless of the control bit value. The control bit and the other global override signal drive another NAND gate that similarly controls the clocking of a register for the processor's logic sub-unit.
The dedicated type of local control described in Batcher involves only a single level of conditional execution. A more sophisticated type of local control entails nesting one level of conditional execution on another level of conditional execution, as arises in conditionally executing a statement, such as an IF-THEN-ELSE statement, that itself is conditional.
In prior art SIMD machines that have this two-level nesting type of local control, a single control bit stored in a dedicated one-bit control register is again typically used in implementing the local control. The control bit is initially set at "1" in a selected group of the PEs in such a SIMD system. This enables the selected PEs to execute an IF-THEN-ELSE statement. The control bit is initially set at "0" in the remaining PEs and disables them from executing the IF-THEN-ELSE statement.
Execution of the IF-THEN-ELSE statement in a selected PE is initiated by transferring (saving) the value of the control bit to a non-dedicated general-purpose working register elsewhere in the PE. The selected PE then calculates the IF condition of the statement and stores the "1" or "0" result in the control register. All of the selected PEs with the control bit at "1" execute the THEN portion of the statement. All of the selected PEs with the control bit at "0" are temporarily disabled.
Next, the value of the control bit is inverted in the selected PEs (only). Each selected PE whose control bit was at "1" during the THEN portion of the IF-THEN-ELSE statement now has its control bit at "0" and vice versa All of the selected PEs with the control bit at "1" execute the ELSE portion of the statement. The selected PEs with the control bit now at "0" are temporarily disabled. Finally, the initial control bit value stored in the working register of each selected PE is transferred (restored) to its control register.
During periods when the initial value of the control bit in the selected PEs is being transferred from the control register to the working register and vice versa, the PEs are not performing arithmetic or logic operations. That is, the PEs are basically idle. The instruction execution cycles needed for these transfers are non-productive. Eliminating these non-productive execution cycles would be highly desirable.
Also, when an IF-THEN-ELSE statement is handled in the preceding way, only part of the selected PEs are active during the THEN portion. The remainder are idle. The same thing occurs during the ELSE portion. It would be desirable if the provisions of the THEN and ELSE portions could, under some circumstances, be executed simultaneously in the selected PEs. Considerable execution cycles would be saved. Furthermore, it would be desirable to have more flexible local control in which certain operations, such as data-bit shifts, could be performed at values dependent on data in each PE rather than on global values.