1. Field of the Invention
The present invention relates to computational circuits and, more specifically, to a circuit that predicts whether a conditional branch is taken.
2. Description of the Prior Art
Many modern computing systems use a processor having a pipelined architecture to increase instruction throughput. In theory, pipelined processors can execute one instruction per machine cycle when a well-ordered, sequential instruction stream is being executed. This is accomplished even though the instruction itself may implicate or require a number of separate microinstructions to be executed. Pipelined processors operate by breaking up the execution of an instruction into several stages that each require one machine cycle to complete. Latency is reduced in pipelined processors by initiating the processing of a second instruction before the actual execution of the first instruction is completed. In fact, multiple instructions can be in various stages of processing at any given time. Thus, the overall instruction execution latency of the system (which, in general, can be thought of as the delay between the time a sequence of instructions is initiated, and the time it is finished executing) can be significantly reduced.
Pipelining works well when program execution follows a sequential flow path follwing a sequential model of program execution, in which each instruction in a program is the one immediately in memory following the instruction just executed. A critical requirement and feature of programs, however, is that they have the ability to “branch” or re-direct program execution flow to another set of instructions. Using branch instructions, conditional transfer of control can be made to some other path in the executing program different from the current one. However, this path does not always coincide with the next immediate set of instructions following the instruction that was just executed.
Branch instructions can occur arbitrarily within any particular program, and it is not possible to predict with certainty ahead of time whether program flow will be re-directed. Various techniques are known in the art for guessing about the outcome of a branch instruction, so that, if flow is to be directed to another set of instructions, the correct target address can be pre-calculated, and a corresponding set of data can be prefetched and loaded in advance from memory to reduce memory access latencies.
Sometimes, however, the guess about the branch outcome is incorrect, and this can cause a “bubble,” or a pipeline stall. A bubble or stall occurs when the pipeline contains instructions that do not represent the desired program flow (i.e., such as from an incorrectly predicted branch outcome). A significant time penalty is thus incurred from having to squash the erroneous instruction, flush the pipeline and re-load it with the correct instruction sequence. Depending on the size of the pipeline, this penalty can be quite large.
Various mechanisms have been proposed for minimizing the actual execution time latency for branch instructions. For example, one approach is to compute the branch address while the branch instruction is decoded. This can reduce the average branch instruction cycle, but comes at the cost of an additional address adder that consumes additional area and power.
Another approach uses a target instruction history buffer. An example of this is shown in U.S. Pat. Nos. 4,725,947, 4,763,245 and 5,794,027 incorporated by reference. In this type of system, each target instruction entry in a branch history table is associated with a program counter of a branch instruction executed in the past. When a branch is executed, an entry is filled by the appropriate target instruction. The next time when the branch is in the decoding stage, the branch target instruction can be prepared by matching the program counter to such entry in the branch history table. To increase the useful hit ratio of this approach, a large number of entries must be kept in the table. This requires an undesirable amount of silicon area and power. Moreover, the matching mechanism itself can be a potential source of delay.
Therefore, there is a need for a system of predicting branches that provides a branch indicator for every conditional branch instruction.