1. Field of the Invention
This invention relates in general to the field of microprocessors, and more particularly to a method and apparatus for performing branch prediction on far jump and far call instructions.
2. Description of the Related Art
In information handling systems computer instructions are typically stored in successive addressable locations within a memory. When processed by a Central Processing Unit (CPU), the instructions are fetched from these consecutive memory locations and executed. Each time an instruction is fetched from memory, a program counter within the CPU is incremented so that it contains the address of the next instruction in the sequence. This is the instruction pointer or IP. Fetching of an instruction, incrementing of the program counter, and execution of the instruction continue linearly through memory until a program control instruction such as a jump on condition, a non-conditional jump or a call instruction is encountered.
A program control instruction, when executed, changes the address in the program counter and causes the flow of control to be altered. In other words, program control instructions specify conditions for altering the contents of the program counter. The change in the value of the program counter as a result of the execution of a program control instruction causes a break in the otherwise successive sequence of instruction execution. This is an important feature in digital computers since it provides control over the flow of program execution and a capability for branching to different portions of a program. Examples of program control instructions include Jump, Test and Jump conditionally, Call, and Return, which are also called branch instructions.
A Jump instruction causes the CPU to unconditionally change the contents of the program counter to a specific value, i.e., to the target address for the instruction where the program is to continue execution. A Test and Jump instruction conditionally causes the CPU to test the contents of a status register, or possibly compare two values, and either continues sequential execution or jumps to a new address, called the target address, based on the outcome of the test or comparison. A Call instruction causes the CPU to unconditionally jump to a new target address, but also saves the value of the program counter to allow the CPU to return to the program location it is leaving. A Return instruction causes the CPU to retrieve the value of the program counter that was saved by the last Call instruction, and return program flow back to the retrieved instruction address.
In early microprocessors, execution of program control instructions did not impose significant processing delays because such microprocessors were designed to execute only one instruction at a time. If the instruction being executed was a program control instruction, by the end of execution the microprocessor would know whether it should branch, and if it was supposed to branch, it would know the target address of the branch. Thus, whether the next instruction was sequential, or the result of a branch, it would be fetched and executed without significant delay.
However, modern microprocessors are not so simple. Rather, it is common for modern microprocessors to operate on several instructions at the same time within different blocks or pipeline stages of the microprocessor. Hennessy and Patterson define pipelining as, “an implementation technique whereby multiple instructions are overlapped in execution.” Computer Architecture: A Quantitative Approach, second edition, by John L. Hennessy and David A. Patterson, Morgan Kaufmann Publishers, San Francisco, Calif., 1996. The authors go on to provide the following excellent illustration of pipelining:
“A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing something to the construction of the car. Each step operates in parallel with the other steps, though on a different car. In a computer pipeline, each step in the pipeline completes a part of an instruction. Like the assembly line, different steps are completing different parts of the different instructions in parallel. Each of these steps is called a pipe stage or a pipe segment. The stages are connected one to the next to form a pipe—instructions enter at one end, progress through the stages, and exit at the other end, just as cars would in an assembly line.”
Thus, after instructions are fetched, they are introduced into one end of the pipeline. Then they proceed through pipeline stages within a microprocessor until they complete execution. In such pipelined microprocessors it is often not known whether a branch instruction will alter program flow until the instruction reaches a late stage in the pipeline. But, by this time, the microprocessor has already fetched other instructions and is executing them in earlier stages of the pipeline. If a branch causes a change in program flow, all of the instructions in the pipeline that followed the branch must be thrown out or flushed. In addition, the instruction specified by the target address of the branch instruction must be fetched. Throwing out the intermediate instructions and fetching the instruction at the target address creates processing delays in such pipelined microprocessors. To alleviate this delay problem, many pipelined microprocessors use branch prediction mechanisms in an early stage of the pipeline that predict the outcome of branch instructions, and then fetch subsequent instructions according to the branch prediction.
If the branch prediction logic correctly predicts the outcome of the branch then process flow continues forward from the target address of the branch taken. However, if the branch prediction logic incorrectly predicts the outcome of the branch, then the entire pipeline must be flushed and execution starts anew from the branch instruction forward. Branch mispredictions and the resultant flushing of the pipeline are undesirable due to the amount of time lost in restarting the pipeline at the resolved target address of the mispredicted branch.
As mentioned earlier, conditional or non-conditional branches are implemented by jump instructions. Jump instructions to an address within the same data segment as the jump instruction are called near jumps, while jump instructions to an address in a different data segment are called far jumps. Similarly, calls to an address within the same data segment as the call instruction are designated as near calls, while call instructions to an address in a different data segment are called far calls.
In conventional X86 pipeline microprocessors, the pipeline is flushed and refilled whenever a far jump or far call instruction is executed. This flushing action effectively slows down the operation of the microprocessor. In more detail, the execution of a far jump or far call instruction requires that a new code segment descriptor be loaded into the code segment descriptor register of the microprocessor. The term “far jump-call” is used collectively herein to indicate a far jump or far call instruction. The far jump-call instruction prescribes the new code segment descriptor along with an offset. This code segment descriptor includes a new code segment base address to which the offset is added to determine the far jump-call target address. Once this target address has been computed, it is provided to the next instruction pointer so that subsequent instructions beginning at the target address can be fetched and executed by the pipeline.
Current x86 pipeline processors either 1) do not perform any type of speculative branch for far jump-calls at all, or they 2) utilize a branch target buffer (BTB) for predicting far jump target addresses. In the 1st scenario, the new code segment descriptor corresponding to the segment of the far call instruction is loaded from memory and the target address of the far call is calculated when the far jump is executed, typically in a later pipeline stage. Unfortunately, in this scenario far jumps incur a penalty that is roughly equivalent to the number of stages in the pipeline between the stage where the far jump instruction is fetched and the stage where it is resolved. For pipeline microprocessor having only a few stages, the penalties associated with stalling the pipeline until resolution at a later stage are not sufficiently serious to merit any type of speculative branch logic for far jumps/calls. However, to increase microprocessor throughput, designers continue to decompose the pipeline logic into increasingly more stages. Hence, to provide no far jump prediction at all in a modern pipeline microprocessor will result in excessive pipeline delays associated with the execution of far jump-call instructions.
In the 2nd scenario, the branch target buffer (BTB) provides a small array in an early pipeline stage into which is stored the code segment base entries corresponding to the N most recently executed instructions (wherein N is an integer). The offset of a current far jump instruction is used to index into a far jump BTB. If a corresponding entry exists (i.e. a BTB hit), then the contents of the entry are provided to speculative address calculation logic for calculation of a speculative target address. Subsequent instructions are thus fetched from the speculative address forward. Unfortunately, in this scenario wherein a branch target buffer (BTB) is employed for prediction of far jump target addresses, if no corresponding entry exists in the BTB (i.e. a BTB miss), then the microprocessor pipeline will be stalled until the current far jump instruction is resolved. Also unfortunately, in this scenario, the same penalty is incurred as if no prediction logic had been employed whatsoever.
What is needed is a technique for performing branch prediction on far jumps and far calls in a manner which reduces the pipeline flushing penalties associated with far jumps and calls. Moreover, a mechanism is needed for increasing microprocessor efficiency when far jumps/calls are encountered even when branch target buffer misses occur.