1. Field of the Invention
The present invention relates generally to branch prediction in a microprocessor system and, more particularly, to a method and apparatus for speculatively buffering branch targets.
2. Description of the Related Art
The performance of a microprocessor is directly related to the amount of time it is busy executing instructions. It achieves maximum performance if it never sits idle waiting on fetches from memory or I/O. The microprocessor has a prefetch unit that has the responsibility of keeping the execution unit as busy as possible by providing a constant flow of instructions. The prefetch unit is responsible for keeping enough instructions on hand so the microprocessor does not stop its execution flow to fetch an instruction from memory. This look-ahead feature can significantly increase performance, because much of the time, the next instruction is already waiting at the first stage of the microprocessor execution pipeline. If instructions are sequentially stored, prefetching almost guarantees that the next instruction will always be ready.
However, instruction sequences are not always stored in memory sequentially. Software contains branches or jumps in instruction flow that cause the microprocessor to jump around to different sections of code depending on the task being executed. The prefetch unit can keep track of the current instruction flow, but it cannot predict the future path of branch instructions.
Performance of the microprocessor is further enhanced by a branch prediction unit that works in concert with the prefetch unit. The branch prediction unit, as its name suggests, attempts to predict whether a branch will be taken. As long as the branch prediction unit predicts correctly, the prefetch unit retrieves instructions to be executed in the required order.
In a microprocessor such as Intel's Pentium Pro microprocessor, the branch prediction unit includes a dynamic predictor, such as a branch target buffer, that stores branch history information based on the instruction address of the branch instruction. The branch target buffer tracks the past behavior of branches through history bits. The branch target buffer can track a branch instruction only after it has been previously seen and an entry has been allocated for the instruction address of the branch. Branch prediction typically occurs in the beginning of the microprocessor pipeline, and branch target buffer allocation typically occurs near the end of the pipeline after the branch is known to be in the correct path of the executing program and the branch is resolved. Therefore, the first time a branch instruction at a certain address is encountered, the branch target buffer does not know that the instruction is indeed a branch because the instruction has not been previously executed and allocated.
A second branch prediction unit, a static predictor, receives information on the decoded instructions and can therefore identify a branch instruction that is not detected by the branch target buffer. The static predictor can identify the type of branch instruction and possibly the branch target address. The static predictor performs a static branch prediction based on a set of rules depending on the type of branch instruction encountered. A branch missing the branch target buffer will be statically predicted by the static predictor. The static predictor is also capable of correcting errors made by the branch target buffer. Because, the branch target buffer tracks branches by instruction address rather than by knowing the actual instruction, process switches and self-modifying code can affect the actual instruction stored at a particular address. Branches that are incorrectly identified by the branch target buffer and branches that are missed by the branch target buffer can be corrected by the static predictor.
Due to the lag between encountering a branch instruction for the first time and allocating the branch in the branch target buffer after it is executed and retired, a particular branch may be encountered and statically predicted multiple times before it is ever allocated in the branch target buffer. In the case of a backward conditional branch, the static predictor will predict the branch as taken, resulting in the instructions following the branch that were fetched prior to the static prediction being flushed from the pipeline. Because the branch instruction is not allocated in the branch target buffer until after the instruction is retired, the static prediction and flush loop may be repeated multiple times before the first occurrence of the branch is allocated, especially for a short loop.
Flush cycles have a significant impact on processor performance. As processors become faster, the impact of flush cycles increases. Faster processors typically require deeper pipelines, thus increasing the lag between encountering a branch instruction and allocating the instruction address in the branch target buffer.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above by providing a novel and nonobvious method and apparatus for speculatively buffering branch targets.