1. Technical Field of the Invention
The present invention generally relates to the field of data processing systems, and more particularly to such systems that process instruction streams that include conditional branch instructions.
2. Description of the Prior Art
In most general purpose, stored program, digital computers, software is developed under the assumption that program instructions are executed in their entirety in a sequential fashion. This frees the software developer from the need to account for potential non-sequential operation of the hardware. However, most large-scale, modem machines are designed to take advantage of the overlapping of various functions. In its simplest form, such overlapping permits instruction processing of the N+1st instruction to be performed during operand processing of the Nth instruction. U.S. Pat. No. 4,890,225 issued to Ellis, Jr. et al. shows a rudimentary overlapped machine. To free the software developer from concerns about non-sequentiality, Ellis Jr. et al. store the machine state during the complete execution of the Nth instruction. U.S. Pat. No. 4,924,376 issued to Ooi provides a technique for resource allocation in an overlapped environment.
A more general form of overlapping is termed a pipelined environment. In implementing such a machine, the designer dedicates certain hardware resources to the various repetitive tasks. Overall system performance is improved by breaking the pipeline into these many dedicated hardware elements, or stages. The performance advantage in this dedication comes from employing these dedicated hardware elements simultaneously. Typically, this means that instruction, decode, operand fetch, and arithmetic operations each have separate and dedicated hardware resources. Even though the Nth instruction is processed by each of these hardware resources sequentially, each separate hardware resource is deployed on a different instruction simultaneously. The N+1st instruction may be preprocessed by the instruction fetch and decode hardware, while the Nth instruction is being processed by the operand fetch hardware and while the N−1st instruction is being processed by the arithmetic hardware. U.S. Pat. No. 4,855,904 issued to Daberkow et al. describes a pipelined architecture.
Pipelined architectures work most efficiently when each stage of the pipeline is filled with a program instruction so that the pipeline generates useful output during every clock cycle. However, program instructions do not always proceed in a linear sequence. A program may contain various changes, or branches, that alter the program flow. For instance, “if A=B, then branch to instruction C” is an example of a conditional branch instruction that alters the stream of the program to instruction C if A=B. However, if the conditional branch instruction is fetched before the condition of “A=B” is determined, the processor can not determine which branch to take. Without any alternate methods to predict or pre-calculate the resolution of the condition, the processor must stall the pipeline until the condition is determined. Stalling is undesirable because it wastes processor resources and thus adversely affects processor efficiency.
Various processors use branch prediction logic to help overcome the problem of stalling the pipeline. With branch prediction, the processor will guess which way the branch condition will be resolved. Often this guess is based on previous history. If the branch prediction logic predicts that a condition will be met, then the processor will process a certain set of target instructions. If the branch prediction logic predicts that a condition will not be met, then the processor will process sequential instructions, the instructions following the branch condition. If the branch prediction logic correctly predicts the branch condition, the processor will not stall. If, however, the prediction is incorrect, the instructions in the pipeline must be flushed and the correct instructions fetched. The processor may require several clock cycles to flush the pipeline and fetch the correct instructions. Therefore, a misprediction can reduce the efficiency of the processor.