The present invention relates to computer systems and more particularly to a processor that performs branch prediction using first level and second level branch prediction tables.
Advanced processors employ pipelining techniques to execute instructions at very high speeds. On such processors, the overall machine is organized as multiple pipelines consisting of several cascaded stages of hardware. Instruction processing is divided into a sequence of operations, and each operation is performed by hardware in a corresponding pipeline stage (xe2x80x9cpipe stagexe2x80x9d). Independent operations from several instructions may be processed simultaneously by different pipe stages, increasing the instruction throughput of the processor. Where a pipelined processor includes multiple execution resources in each pipe stage, the throughput of the processor can exceed one instruction per clock cycle. To make full use of this instruction execution capability, the execution resources of the processor must be provided with sufficient instructions from the correct execution path.
In a typical computer system, an instruction pointer (IP) directs the processor from one instruction of the program code to the next instruction. An instruction might direct this IP to the next instruction in the normal program code sequence, or it may direct the IP to skip a portion of the program code and resume execution with a non-sequential instruction. The instruction that causes the processor to either continue executing the next instruction in sequence or xe2x80x9cbranchxe2x80x9d to a different, non-sequential instruction is called a branch instruction.
For example, when a word processor does spell-checking, software instructions are executed to verify that each word is spelled correctly. As long as the words are spelled correctly, the instructions execute sequentially. Once an incorrectly spelled word is found, however, a branch instruction directs the IP to branch to a subroutine that notifies the user about the incorrectly spelled word. This subroutine is then executed by the processor.
Branch instructions pose major challenges to keeping the pipeline filled with instructions from the correct execution path. When a branch instruction is executed and the branch condition met, control flow of the processor jumps to a new code sequence, and instructions from the new code sequence are transferred to the pipeline. Branch execution typically occurs at the back end of the pipeline, while instructions are fetched at the front end of the pipeline. If instruction fetching relies on branch execution to determine the correct execution path, the processor pipeline may be filled with instructions from the wrong execution path before the branch condition is resolved. These instructions would then have to be flushed from the pipeline, leaving resources in the affected pipe stages idle while instructions from the correct execution path are fetched. The idle pipe stages are referred to as pipeline bubbles, since they provide no useful output until they are filled by instructions from the correct execution path.
Modern processors incorporate branch prediction modules at the front ends of their pipelines to reduce the number of pipeline bubbles. When a branch instruction enters the front end of the pipeline, the branch prediction module predicts whether the branch instruction will be taken when it is executed at the back end of the pipeline. If the branch is predicted taken (non-sequential instruction execution), the branch prediction module provides a branch target address to the instruction fetch module, redirecting the IP by setting the IP address equal to the address containing the first instruction of the branched program code. The address containing this first instruction of the branched code is called the xe2x80x9ctarget address.xe2x80x9d The fetch module, which is also located at the front end of the pipeline, begins fetching instructions from the target address. If, on the other hand, a branch predictor predicts that a branch will not be taken (sequential instruction execution), the branch predictor increments the IP address so that the IP points to the next instruction in the normal program code sequence. When branch execution occurs in the backend of the pipeline, the processor can validate whether the prediction made in the front end was correct. If incorrect, the pipeline is flushed. The higher the branch prediction accuracy, the fewer the number of pipeline bubbles and flushes.
Conventional branch prediction modules employ branch target buffers (BTBs) to store prediction entries containing information such as whether a branch will be taken and the likely target address when the branch is taken. These branch prediction entries are associated with the IP addresses that contain the branch instructions. For each IP address that is tracked in a branch prediction table, its associated branch prediction entry includes the IP address along with historical information that is helpful to predict whether or not the branch will be taken in the future. However, even the process of looking up an instruction in the BTB, determining whether the branch is taken, and providing a target address to the fetch module on a taken prediction causes a delay in resteering the processor to the target address. This delay allows instructions from the wrong execution path to enter and propagate down the pipeline. Since these instructions do not add to forward progress on the predicted execution path, they create xe2x80x9cbubblesxe2x80x9d in the pipeline when they are flushed. More accurate and complete branch prediction algorithms (using larger sized branch tables) take longer to complete and generate greater delays in the resteer process. The greater the number of clock cycles required to resteer the pipeline, the greater the number of bubbles created in the pipeline. Thus there is a tradeoff between the speed of access of the branch prediction structures, and the size and accuracy of the content in these structures.
For speed and cost reasons, modern processors often limit the size of the BTB employed. This reduces the accuracy of the branch detection and prediction, especially on large workloads. Given the smaller size of the BTB, a new branch prediction entry sometimes must overwrite an older branch prediction entry. If a branch instruction associated with an overwritten branch prediction entry is then re-executed by the processor, no historical information exists to help the branch predictor predict whether or not the branch should be taken. As a result, branch prediction accuracy decreases, reducing processor performance. As the size of software applications increases, the number of branch instructions in those applications increases, and the limited size of the branch prediction table becomes a significant problem. Thus there is a need to provide a solution that yields low latency branch predictions for the most frequent subset of branches (those with high locality), and yet provides meaningful predictions for the overall working set.
A branch predictor is described. A first branch prediction table is coupled to an IP generator to store branch prediction entries. A second branch prediction table is also coupled to the IP generator to store a greater number of branch prediction entries.
In accordance with an embodiment of the present invention, the two level branch prediction structure may be found to combine the benefits of high speed (low latency) branch prediction and resteering for the highest locality of branches, with overall high accuracy branch detection and prediction for the overall working set at large, albeit at reduced speed. This may be accomplished without significant die size growth.
Other features and advantages of the present invention will be apparent from the accompanying drawings and the detailed description that follows.