Microprocessors often employ the use of pipelining to enhance performance. Within a pipelined microprocessor, the functional units necessary for executing different stages of an instruction are operated simultaneously on multiple instructions to achieve a degree of parallelism leading to performance increases over non-pipelined microprocessors. As an example, an instruction fetch unit, an instruction decode unit and an instruction execution unit may operate simultaneously. During one clock cycle, the instruction execution unit executes a first instruction while the instruction decode unit decodes a second instruction and the fetch unit fetches a third instruction. During a next clock cycle, the execution unit executes the newly decoded instruction while the instruction decode unit decodes the newly fetched instruction and the fetch unit fetches yet another instruction. In this manner, neither the fetch unit nor the decode and execute unit need to wait for the instruction execution unit to execute the last instruction before processing new instructions. In state-of-the-art microprocessors, the steps necessary to fetch and execute an instruction are sub-divided into a larger number of stages to achieve a deeper degree of pipelining.
A pipelined CPU operates most efficiently when the instructions are executed in the sequence in which the instructions appear in memory. Unfortunately, this is typically not the case. Rather, computer programs typically include a large number of branch instructions, which, upon execution, may cause instructions to be executed in a sequence other than as set forth in memory. More specifically, when a branch instruction is encountered in the program flow, execution continues either with the next sequential instruction from memory or execution jumps to an instruction specified at a “branch target” address. Typically the branch specified by the instruction is said to be “Taken” if execution jumps and “Not Taken” if execution continues with the next sequential instruction from memory.
Branch instructions are either unconditional, meaning the branch is taken every time the instruction is executed, or conditional, meaning the branch is taken or not depending upon a condition. Instructions to be executed following a conditional branch are not known with certainty until the condition upon which the branch depends is resolved. However, rather than wait until the condition is resolved, state-of-the-art microprocessors may perform a branch prediction, whereby the microprocessor tries to determine whether the branch will be Taken or Not Taken, and if Taken, to predict the target address for the branch is predicted to be Taken, the microprocessor fetches and speculatively executes the instruction found at the predicted branch target address. The instructions executed following the branch prediction are “speculative” because the microprocessor does not yet know whether the prediction will be correct or not. Accordingly, any operations performed by the speculative instructions cannot be fully completed.
For example, if a memory write operation is performed speculatively, the write operation cannot be forwarded to external memory until all previous branch conditions are resolved, otherwise the instruction may improperly alter the contents of the memory based on a mispredicted branch. If the branch prediction is ultimately determined to be correct, the speculatively executed instructions are retired or otherwise committed to a permanent architectural state. In the case of a memory write, the write operation is normally forwarded to external memory. If the branch prediction is ultimately found to be incorrect, then any speculatively executed instructions following the mispredicted branch are typically flushed from the system. For the memory write example, the write is not forwarded to external memory, but instead is discarded.
A wide variety of techniques have been developed for performing branch prediction. Typically, various tables are provided for storing a history of previous branch predictions along with previous branch targets. However, advanced compiler techniques now permit the same branch instruction to have different branch target addresses, when the branch instruction is repeatedly used in a sequence of instructions. As a result, if a branch prediction unit predicts a branch target address for a branch operation based on a previous branch target for that branch operation, there is a strong chance that the predicted branch target may be incorrect.
Therefore, it is desirable for a method and apparatus that is able to predict branch target addresses for an impending branch operation, based on the branch target address specified by the impending branch operation.