1. Field of the Invention
The present invention relates to the field of microprocessors, and in particular, to systems and methods for branch prediction in microprocessors.
2. Background Art
Advanced processors employ pipelining techniques to execute instructions at very high speeds. On such processors, the overall machine is organized as a pipeline consisting of several cascaded stages of hardware. Instruction processing is divided into a sequence of operations, and each operation is performed by hardware in a corresponding pipeline stage (xe2x80x9cpipe stagexe2x80x9d). Independent operations from several instructions may be processed simultaneously by different pipe stages, increasing the instruction throughput of the pipeline. Where a pipelined processor includes multiple execution resources in each pipe stage, the throughput of the processor can exceed one instruction per clock cycle. To make full use of this instruction execution capability, the execution resources of the processor must be provided with sufficient instructions from the correct execution path.
Branch instructions pose major challenges to keeping the pipeline filled with instructions from the correct execution path. When a branch instruction is executed and the branch condition met, control flow of the processor jumps to a new code sequence, and instructions from the new code sequence are transferred to the pipeline. Branch execution typically occurs in the back end of the pipeline, while instructions are fetched at the front end of the pipeline. If changes in the control flow are not anticipated correctly, several pipe stages worth of instructions may be fetched from the wrong execution path by the time the branch is resolved. When this occurs, the instructions must be flushed from the pipeline, leaving idle pipe stages (bubbles) until the processor refills the pipeline with instructions from the correct execution path.
To reduce the number of pipeline bubbles, processors incorporate branch prediction modules at the front ends of their pipelines. When a branch instruction enters the front end of the pipeline, the branch prediction module forecasts whether the branch instruction will be taken when it is executed at the back end of the pipeline. If the branch is predicted taken, the branch prediction module indicates a target address to which control of the processor is predicted to jump. A fetch module, which is also located at the front end of the pipeline, fetches instructions beginning at the indicated target address.
Branch instructions are employed extensively in loops to execute a series of instructions (xe2x80x9cthe loop bodyxe2x80x9d), repeatedly. Modulo-scheduled loops are loops that are organized in a pipelined manner to improve execution efficiency. For one type of loop (top loop), a branch condition is tested following each iteration and control is returned to the first instruction of the loop body if the branch condition is met. The last iteration of the loop occurs when the branch condition is not met, in which case control of the processor passes (xe2x80x9cfalls throughxe2x80x9d) to the instruction that follows the loop branch. Thus, the loop branch is taken for all but the final iteration of the top loop. Top loops terminate when the loop branch is not taken. Another type of loop (exit loop) employs a branch at a location other than the end of the loop body. In this case, the loop branch is not taken for all but the fmal iteration of the loop. Exit loops terminate when the loop branch is taken.
Loops are very common programming structures, and branch prediction systems are typically designed to predict the loop branch conditions correctly for the bulk of the loop iterations. For example, the branch prediction system may be set up to automatically predict top loop branches as taken and exit loop branches as not taken. This strategy provides accurate branch predictions for all but the last iteration of each loop, when the loop condition changes.
Given the ubiquity of loop structures, mispredicting the loop branch on just the terminal iteration can have a significant impact on the overall performance of the processor. This is especially true where the loop is nested within an outer loop, when the loop count is small, or when the loop body is small. In the first case, the misprediction penalty associated with the terminal iteration of the inner loop is repeated for each iteration of the outer loop. In the latter cases, the misprediction penalty may exceed the total number of cycles necessary to execute the loop.
The present invention addresses these and other limitations associated with available branch prediction systems.
The present invention provides a system and method for predicting loop branches, including the loop branch that terminates the loop.
In accordance with the present invention, a loop prediction system includes a counter module, a control module, and an end_of_loop (EOL) module. The counter tracks the number of loop branches that are in process. The control module determines when loop termination approaches, and switches the counter to track the number of loop branches that remain to be issued. The EOL module compares the number of loop branches that remain to be issued with a threshold value and generates a resteer signal when a match is detected.
For one embodiment of the invention, the counter is a dual mode counter that tracks the number of loop branches in process in a first mode and uses this number to track the number of loop branches that remain to be issued in the second mode. For another embodiment of the invention, the counter includes a first counter to track the number of loop branches in process and a second counter to track the number of loop branches that remain to be issued.