1. Field of the Invention
The present invention relates generally to multiprocessor apparatus having parallel processors for execution of parallel related instruction streams and the compiling method for generating the streams. In its particular aspects, the present invention relates to apparatus and method for collective branching of execution by the processors.
2. Description of the Prior Art
Multiprocessor techniques for the exploitation of instruction level parallelism for achieving increased processing speed over that obtainable with a uniprocessor are known for VERY LONG INSTRUCTION WORD (VLIW) machine architecture. A theoretical VLIW machine consists of multiple parallel processors that operate in lockstep, executing instructions fetched from a single stream of long instructions, each long instruction consisting of a possibly different individual instruction for each processor. A run-time delay in the completion by any one processor of its individual instruction, due to unavoidable events such as memory access conflicts, delays the issuance of the entire next long instruction for all processors.
Known MULTIPLE INSTRUCTION STREAM MULTIPLE DATA STREAM (MIMD) architecture enables processors to operate independently when the long instructions are partitioned into separate or multiple streams. By independence, we mean that a run-time delay in one stream need not immediately delay execution of the other streams. Such independence however, cannot be complete since a mechanism must be provided to enable the processors to periodically synchronize at barrier points in the instruction streams. In the prior art, such barrier points have been fixed and processors have been equipped for issuing an "I GOT HERE" flag to a barrier coordinating unit when a barrier is reached by the processor, which then stalls or idles until receipt said unit of a "GO" instruction issued when all processors have issued their "I GOT HERE" flag. Illustrative are U.S. Pat. Nos. 4,344,134; 4,365,292; and 4,212,303 to Barnes individually or with others.
Heretofore, lockstep operation of parallel processors has been necessary for the execution of a branching instruction, i.e. the evaluation of a condition, testing the resulting value, and executing a branch, such as selectively jumping to an instruction address based upon or defined by the result of the test. The lockstep operation ensures that the various processors take the same branch or jump. While collective branching is important to fully exploit instruction level parallelism, the prior art has generally restricted multiple instruction stream architecture to execution of programs with no data-dependent branching (see for example said U.S. Pat. No. 4,365,292 at column 3, lines 48-53).