A microprocessor is a digital device that executes instructions specified by a computer program. Modern microprocessors are typically pipelined. That is, they operate on several instructions at the same time, within different blocks or pipeline stages of the microprocessor. Hennessy and Patterson define pipelining as, “an implementation technique whereby multiple instructions are overlapped in execution.” Computer Architecture: A Quantitative Approach, 2nd edition, by John L. Hennessy and David A. Patterson, Morgan Kaufmann Publishers, San Francisco, Calif., 1996. They go on to provide the following excellent illustration of pipelining:    A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing something to the construction of the car. Each step operates in parallel with the other steps, though on a different car. In a computer pipeline, each step in the pipeline completes a part of an instruction. Like the assembly line, different steps are completing different parts of the different instructions in parallel. Each of these steps is called a pipe stage or a pipe segment. The stages are connected one to the next to form a pipe—instructions enter at one end, progress through the stages, and exit at the other end, just as cars would in an assembly line.
Microprocessors operate according to clock cycles. Typically, an instruction passes from one stage of the microprocessor pipeline to another each clock cycle. In an automobile assembly line, if the workers in one stage of the line are left standing idle because they do not have a car to work on, then the production, or performance, of the line is diminished. Similarly, if a microprocessor stage is idle during a clock cycle because it does not have an instruction to operate on—a situation commonly referred to as a pipeline bubble—then the performance of the processor is diminished.
A potential cause of pipeline bubbles is branch instructions. When a branch instruction is encountered, the processor must determine the target address of the branch instruction and begin fetching instructions at the target address rather than the next sequential address after the branch instruction. Because the pipeline stages that definitively determine the target address are well below the stages that fetch the instructions, bubbles are created by branch instructions. As discussed more below, microprocessors typically include branch prediction mechanisms to reduce the number of bubbles created by branch instructions.
One particular type of branch instruction is a return instruction. A return instruction is typically the last instruction executed by a subroutine for the purpose of restoring program flow to the calling routine, which is the routine that caused program control to be given to the subroutine. In a typical program sequence, the calling routine executes a call instruction. The call instruction instructs the microprocessor to push a return address onto a stack in memory and then to branch to the address of the subroutine. The return address pushed onto the stack is the address of the instruction that follows the call instruction in the calling routine. The subroutine ultimately executes a return instruction, which pops the return address off the stack, which was previously pushed by the call instruction, and branches to the return address, which is the target address of the return instruction. An example of a return instruction is the x86 RET instruction. An example of a call instruction is the x86 CALL instruction.
An advantage of performing call/return sequences is that it allows subroutine call nesting. For example, a main routine may call subroutine A that pushes a return address; and subroutine A may call subroutine B that pushes a return address; then subroutine B executes a return instruction that pops the return address pushed by subroutine A; then subroutine A executes a return instruction that pops the return address pushed by the main routine. The notion of nesting subroutine calls is very useful and the example above may be extended to as many calls deep as the stack size can support.
Because of the regular nature of call/return instruction sequences, modern microprocessors employ a branch prediction mechanism commonly referred to as a return stack to predict the target addresses of return instructions. The return stack is a small buffer that caches return addresses in a last-in-first-out manner. Each time a call instruction is encountered, the return address to be pushed onto the memory stack is also pushed onto the return stack. Each time a return instruction is encountered, the return address at the top of the return stack is popped and used as the predicted target address of the return instruction. This operation reduces bubbles, since the microprocessor does not have to wait for the return address to be fetched from the memory stack.
Return stacks typically predict return instruction target addresses very accurately due to the regular nature of call/return sequences. However, the present inventors have discovered that certain programs, such as certain operating systems, do not always execute call/ret instructions in the standard fashion. For example, code executing on an x86 microprocessor may include a CALL, then a PUSH to place a different return address on the stack, then a RET, which causes a return to the pushed return address rather than to the address of the instruction after the CALL, which was pushed onto the stack by the CALL. In another example, the code performs a PUSH to place a return address on the stack, then performs a CALL, then performs two RET instructions, which causes a return to the pushed return address in the case of the second RET rather than to the instruction after a CALL that preceded the PUSH. This behavior causes a misprediction by a return stack.
Therefore, what is needed is an apparatus for more accurately predicting a return instruction target address, particularly for code that executes a non-standard call/return sequence.