1. Field of the Invention
The present invention relates generally to branch target address prediction in a computer system, and more specifically to a method and an apparatus for predicting the target of a branch instruction, by indexing a translation lookaside buffer to retrieve a page number portion of a predicted branch target address.
2. Related Art
Early computers generally processed instructions one at a time, with each instruction being processed in four sequential stages: instruction fetch, instruction decode, execute and result write-back. Within such early computers, different logic blocks performed each processing stage, and each logic block waited until all the preceding logic blocks completed before performing its operation.
To improve efficiency, processor designers now overlap operations of the processing stages. This enables a processor to operate on several instructions simultaneously. During a given time period, the fetch, decode, execute and write-back logic stages process different sequential instructions in a computer's instruction stream at the same time. At the end of each clock period, the result of each processing stage proceeds to the next processing stage.
Processors that use this technique of overlapping processor stages are known as "pipelined" processors. Some processors further divide each stage into sub-stages for additional performance improvement. Such processors are referred to as "deeply pipelined" processors.
In order for a pipelined processor to operate efficiently, an instruction fetch unit at the head of the pipeline must continually provide the pipeline with a stream of processor instructions. However, branch instructions within an instruction stream prevent the instruction fetch unit from fetching subsequent instructions until the branch condition is fully resolved. In pipelined processors, the branch condition will not be fully resolved until the branch condition reaches and instruction execution stage near the end of the processor pipeline. Hence, the instruction fetch unit will stall when an unresolved branch condition prevents the instruction fetch unit from knowing which instruction to fetch next.
To alleviate this problem, some pipelined processors use branch prediction mechanisms to predict the outcome of branch instructions. This can involve predicting the target of a branch instruction as well as predicting the condition of the branch. These predictions are used to determine a predicted path for the instruction stream in order to fetch subsequent instructions. When a branch prediction mechanism predicts the outcome of a branch instruction, and the processor executes subsequent instructions along the predicted path, the processor is said to have "speculatively executed" along the predicted instruction path. During speculative execution, the processor is performing useful work if the branch instruction was predicted correctly. However, if the branch prediction mechanism mispredicted the result of the branch instruction, the processor is speculatively executing instructions down the wrong path and is not performing useful work. When the processor eventually detects the mispredicted branch, the processor must flush all the speculatively executed instructions and restart execution from the correct address.
Branch prediction involves predicting the outcome of a branch to determine whether or not the branch is taken. Branch prediction also involves predicting the target address of a branch to determine where the branch will go to if it is taken.
Computer systems that perform branch prediction typically store predicted target addresses in a table known as a "branch target address table" or a "branch target buffer." Branch target address tables often include a large number of entries in order provide predicted target addresses for a large number of branch instructions to effectively improve processor performance. Additionally, each entry contains a branch target address, which can be many bytes in size. Consequently, a branch target address table may grow to be quite large. A branch target address typically includes a page number portion, comprising higher order bits which specify a page number, and a page offset portion, comprising lower order bits specifying an offset into a page. If a branch target address table grows too large, multiple cycles may be required to access the table, and the prediction success rate will fall as the table is used to predict branch targets for instructions further down the pipeline. Furthermore, computer instruction streams tend to exhibit a large amount of locality. This means that even though a predicted branch target table may contain a large number of entries, these entries tend to be concentrated in a relatively small number of pages of memory. Hence, much of the space in a conventional branch target address table is wasted storing redundant page numbers.
What is needed is a method and an apparatus for storing branch target addresses that reduces the size of a predicted branch target table by reducing the amount of storage required to store branch target addresses.