1. Field of the Invention
This invention relates in general to the field of microprocessors, and more particularly to branch prediction and internal call/return stacks.
2. Description of the Related Art
Computer programs typically comprise a main program or procedure that calls other procedures, also commonly referred to as subroutines or functions. Each procedure is typically located in memory, referenced by a specific memory address. Consequently, microprocessors typically include in their instruction sets an instruction to call a procedure and an instruction to return from a procedure. When a microprocessor encounters a call instruction, the microprocessor transfers control to the procedure whose address the call instruction specifies. Once the called procedure has completed it executes a return instruction and the microprocessor returns control back to the calling procedure at the instruction following the call instruction in the calling procedure.
In x86 architecture microprocessors, the call and return instructions are the CALL (Call Procedure) and RET (Return from Procedure) instructions. These instructions are specified on pages 25-50 to 25-56 and 25-271 to 25-274, respectively, of the Intel Pentium Processor Family Developer's Manual Volume 3: Architecture and Programming Manual, 1995, which is hereby incorporated by reference.
The x86 architecture CALL instruction saves the address of the instruction following the CALL instruction in the main memory of the system. The RET instruction retrieves the address, referred to as the return address, from main memory and transfers control to the calling procedure at the return address.
The CALL and RET instructions implicitly use a portion of the main memory called the "stack" to save the return address. A stack is a last-in-first-out memory. The "top" of the main memory stack in an x86 architecture processor is pointed to by a Stack Pointer (SP). A CALL instruction "pushes" the return address onto the stack. That is, the CALL decrements the SP by the size of the return address and then stores the return address in main memory at the address specified by the updated SP value. Conversely, a RET instruction "pops" the return address off the stack, places the return address into the Instruction Pointer (IP) register of the microprocessor, and increments the SP by the size of the return address. That is, the RET instruction retrieves the return address from memory at the address specified by the SP register and then increments the value of the SP.
Modern microprocessors operate on several instructions at the same time, within different blocks or pipeline stages of the microprocessor. Hennessy and Patterson define pipelining as, "an implementation technique whereby multiple instructions are overlapped in execution." Computer Architecture: A Quantitative Approach, 2.sup.nd edition, by John L. Hennessy and David A. Patterson, Morgan Kaufmann Publishers, San Francisco, Calif., 1996. The authors go on to provide the following excellent illustration of pipelining:
A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing something to the construction of the car. Each step operates in parallel with the other steps, though on a different car. In a computer pipeline, each step in the pipeline completes a part of an instruction. Like the assembly line, different steps are completing different parts of the different instructions in parallel. Each of these steps is called a pipe stage or a pipe segment. The stages are connected one to the next to form a pipe--instructions enter at one end, progress through the stages, and exit at the other end, just as cars would in an assembly line.
Thus, as the microprocessor fetches instructions it introduces them into one end of the pipeline. The instructions proceed through pipeline stages within the microprocessor until they complete execution.
In most systems, the time required for microprocessor accesses to main memory is much greater than the clock cycle time of the microprocessor, typically by at least an order of magnitude. Thus, when the microprocessor accesses main memory to push or pop a return address, it consumes much valuable time.
Furthermore, the stage that decodes the instructions is near the beginning of the pipeline, whereas the stage that performs memory accesses is typically near the end of the pipeline. This situation creates stalls or holes in the pipeline. These stalls are analogous to the assembly line where an early stage knows the next step is to add a certain part to the car, but the part is not available until a much later stage in the pipeline. Thus, the car must be passed down multiple stages without any useful work being done on the car until it reaches the stage with the part.
To overcome these problems, some modern x86 processors utilize a call/return stack internal to the processor to substantially parallel the stack in main memory. For example, the Advanced Micro Devices AMD-K6 processor utilizes an internal call/return stack as noted on page 7 of Chapter 2 and page 54 of Chapter 5 of the AMD-K6 MMX (TM) Enhanced Processor X86 Code Optimization Application Note issued August 1997, Publication #21828, Rev:A, Amendment/0.
When a microprocessor with an internal call/return stack executes a CALL instruction, it pushes the return address onto the internal call/return stack in addition to the main memory stack. Conversely, when a RET instruction is executed, the processor pops the return address off the internal call/return stack into the IP register while the return address is also being popped from the main memory stack so that the two values may be compared to verify that the return address from the internal call/return stack is correct. While the return address is being popped from the main memory stack, the microprocessor proceeds on fetching instructions from the address in the IP register and updating the IP register.
The internal call/return stack enables the processor to continue processing instructions in the pipeline while waiting for the return address to be fetched from the main memory stack. If the internal call/return stack return address is not correct, then the pipeline is flushed of all instructions processed after the incorrect return. Thus, the more consistent the internal call/return stack is kept with the main memory call/return stack the more effective the internal call/return stack becomes.
One instance where the internal and main memory call/return stacks may become inconsistent is when a call or return instruction is speculatively executed after a predicted conditional branch instruction. A conditional branch instruction examines a condition specified in the instruction, such as determining whether a parameter is equal to zero. The instruction branches to a specified target address if the condition is true, but executes the next sequential instruction if the condition if false. An example of a conditional branch instruction is the Jump if Condition is Met (JCC) instruction in x86 processors as specified on pages 25-190 to 25-192 of the Intel Pentium Processor Family Developer's Manual.
Typically, processors predict whether the conditional branch will be taken in order to avoid stalling the pipeline. That is, the processor predicts the outcome of the conditional branch and continues fetching and executing instructions based on the prediction. When the true outcome of the conditional branch is resolved later in the pipeline, if the prediction turns out to be incorrect, then the pipeline must be flushed of all instructions speculatively executed after the conditional branch instruction. This includes any call or return instructions the processor speculatively executed after the conditional branch instruction.
However, the situation described may cause an inconsistency between the internal and main memory call/return stacks. For example, suppose the processor predicts the outcome of a JCC instruction. Then, before resolving the JCC, the processor speculatively executes a CALL instruction, thereby pushing a return address onto the internal call/return stack. Next, the processor determines that it mispredicted the JCC and incorrectly executed the CALL. The processor has not pushed the return address onto the main memory call/return stack and will not. Consequently, the internal call/return stack is inconsistent with the main memory call/return stack because the return address has already been pushed onto the internal call/return stack. The condition will likely cause a pipeline flush to occur when the processor executes the next return instruction, as described above, due to the inconsistency between the internal call/return stack and the main memory call/return stack.
This problem is further exacerbated by the fact that the microprocessor may speculatively execute multiple call and/or return instructions before the conditional branch instruction is resolved. This is particularly likely in modern microprocessors that typically have deep pipelines.
Therefore, what is needed is an apparatus and method that corrects the internal call/return stack when the microprocessor incorrectly speculatively executes one or more call and return instructions.