1. Field of the Invention
This invention relates to the field of superscalar microprocessors and, more particularly, to stack structures for subroutine return addresses employed within superscalar microprocessors.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by executing multiple instructions concurrently and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time in which the various stages of the instruction processing pipelines complete their tasks. Instructions and computed values are transferred to the next instruction processing pipeline stage at the end of the clock cycle.
Many superscalar microprocessor manufacturers design their microprocessors in accordance with the x86 microprocessor architecture. The x86 microprocessor architecture is widely accepted in the computer industry, and therefore a large body of software exists which runs only on microprocessors embodying this architecture. Microprocessors designed in accordance with the x86 architecture advantageously retain compatibility with this body of software. As will be appreciated by those skilled in the art, the x86 architecture includes a "stack" area in memory. The stack is useful for passing information between a program and a subroutine called by that program, among other things. A subroutine preforms some function that a program requires, and then returns to the instruction following the call to the subroutine. Therefore, a subroutine may be called from multiple places within a program to perform its function.
In the x86 architecture, the ESP (extended stack pointer) register points to the address in memory which currently forms the top of the stack. A stack structure is a Last-In, First-Out (LIFO) structure in which values which are placed on the stack in a certain order and are removed from the stack in the reverse order. Therefore, the top of the stack contains the last item placed on the stack. The action of placing a value on the stack is known as a "push", and requesting that a push be performed is a "push command". The action of removing a value from the stack is referred to as a "pop", and requesting that a pop be performed is a "pop command". When a push command is performed, the ESP register is decremented by the size (in bytes) of the value specified by the push command. The value is then stored at the address pointed to by the decremented ESP register value. When a pop command is performed, a number of bytes specified by the pop command are copied from the top of the stack to a destination specified by the pop command, and then the ESP register is incremented by the number of bytes.
An example of the use of push and pop commands are the subroutine call and return instructions, as mentioned above. A typical subroutine call involves pushing the operands for the subroutine onto the stack, then pushing the address of the next instruction to be executed after the subroutine completes onto the stack. The subroutine is called and executes, accessing any operands it may need by indexing into the stack. After completing execution, the subroutine pops the next instruction address from the top of the stack and causes that address to be fetched by the microprocessor.
The x86 microprocessor architecture, similar to other microprocessor architectures, contains branch instructions. A branch instruction is an instruction which causes the next instruction to be fetched from one of at least two possible addresses. One address is the address immediately following the branch instruction. This address is referred to as the "next sequential address". The second address is specified by the branch instruction, and is referred to as the "branch target address" or simply the "target address". Branch instructions typically select between the target address and the next sequential address based on a particular condition flag which is set by a previously executed instruction.
Since the next instruction to be executed after the branch instruction is not known until the branch instruction executes, superscalar microprocessors must either stall instruction fetching until the branch instruction executes (reducing performance) or predict which address the branch instruction will select when executed. When the prediction method is chosen, the resulting superscalar microprocessor may speculatively fetch and execute instructions residing at the predicted address. If the prediction is incorrect (as determined when the branch instruction executes), then the instructions following the branch instruction are discarded from the instruction processing pipeline and the correct instructions are fetched. Branch predictions are typically made when a branch instruction is decoded or when instructions are fetched, depending on the branch prediction scheme and the configuration of the microprocessor.
A particularly difficult type of branch instruction to predict in the x86 microprocessor architecture is the RET/IRET instruction (or return instruction). The return instruction is a pop command, as described above. This type of branch instruction is difficult to predict because the target address (also referred to as the "return address") is not readily available when the instruction is decoded, unlike some other branch instructions. Instead, the return address is stored on the stack in a location that will be indicated by the value in the ESP register when the return instruction is executed. The value of the ESP register at the time the return instruction is decoded and the value of the ESP register at the time the return instruction is executed may differ.
Return address prediction is further complicated by the use of "fake return" instructions. Return instructions are normally used in conjunction with the CALL instruction. The CALL instruction is another special type of branch instruction which causes the address of the instruction immediately following the call to be pushed onto the stack, and then instructions are fetched from an address specified by the CALL instruction. The CALL instruction is therefore a push command, as described above, as well as a branch instruction. The instruction address placed on the stack by the CALL instruction is the address intended to be used by the return instruction as the return address. The CALL instruction can therefore be used to call a subroutine in a program, and the subroutine typically ends with a return instruction which causes instruction execution to resume at the instruction immediately following the CALL instruction.
"Fake return" instructions are return instructions which are executed when a return address other than a return address provided by a CALL instruction is at the top of the stack. This address may be placed on the stack by executing a PUSH instruction, for example. A mechanism for predicting the target address of the return instruction which handles the existence of fake return instructions is desired.