The present invention relates to improved means and methods for controlling the sequencing and branching of instructions in a digital data processing system.
As is well known, higher performance computers often include mechanisms to increase the effective amount of concurrency or parallelism of actions in the machine. One common technique for increasing parallelism is to "pipeline" the execution of a single stream of instructions. Several consecutive instructions can be concurrently processed, so that the average time between instruction completions is less than the time to process a single instruction.
With conventional instruction sets, the possible gain from pipelining is limited by the fact that an instruction's results influence the behavior of subsequent instructions. Many instructions must be delayed until their inputs are made available by prior instructions. Also pipelining is less effective when the instruction stream is not simply a sequence of consecutive, unconditional instructions. It is then uncertain whether a particular instruction should even be executed, until all prior branch addresses and branch conditions have been fully resolved. It becomes very difficult to look ahead beyond these branch points to fine useful work for the instruction pipeline.
One approach used by some pipelined machines is to suspend the flow of new instructions until branches are fully resolved. In such a case, no mistakes are made in executing the wrong instructions, but the pipeline empties frequently and is ineffective for use with program codes having frequent branches.
Other pipelined machines employ an approach with regard to conditional branches which cause the flow to take a particular assumed branch. The assumed branch is then processed provisionally. When the correct branch condition is finally determined, the assumed direction of flow is either confirmed or refuted. If it is refuted, the provisional instructions are discarded or "undone" before switching to the correct instruction sequence. This scheme maintains high parallelism if the assumption is usually correct. When it is wrong, pipeline capacity is wasted by execution of irrelevant instructions and by the time to refill the pipeline with relevant instructions.
Another approach employed by some machines is to explore both possible paths following a two-way conditional branch. This exploration could simply involve reading the memory words containing the provisional instructions, or it could involve full execution of one or both paths. This improves the rate of execution for the correct path. However, it requires extra hardware which will only have at most 50% effective utilization, and the processing of the incorrect path slows down the correct path somewhat by contending for memory and processor resources. Furthermore, the performance gained by exploring both alternative paths is limited by the presence of additional conditional branches in these paths.
Another alternative approach employed by some computers is to somehow predict the outcome of each conditional branch, based on available information, and provisionally execute only the predicted path. Examples of the use of this prediction approach can be found in the articles: "An Analysis of Instruction-Fetching Strategies in Pipelined Computers", R. W. Holgate and R. N. Ibbett, IEEE Transactions on Computers, Vol. C-29, No. 4, pp. 325-329, April 1980, and "The S-1 Project: Developing High-Performance Digital Computers", L. C. Widdoes, Jr., COMPCON 80, Feb. 25-28, Digest of Papers, IEEE Catalog No. 80CH1491-OC, pp. 282-291. A difficulty with such prediction approaches is that they are relatively complex and require a significant amount of extra hardware.