Programs frequently include subroutine call (call) instructions and return from subroutine (return) instructions. A call instruction causes a change in program flow from the current routine, or instruction sequence, to a different instruction sequence, or subroutine, than the routine currently being fetched and executed. A call instruction specifies a call address, or target address, which is the address of the first instruction of the subroutine. Additionally, the call instruction instructs the processor to save the address of the next instruction following the call instruction, referred to as the return address. A return instruction also causes a change in program flow to a different instruction sequence than the instruction sequence currently being fetched and executed. However, a return instruction specifies no target address explicitly. Instead, a return instruction instructs the processor to use the most recently saved return address as the address of the first instruction of the different instruction sequence, or the routine that called the now-returning subroutine. The return instruction in the subroutine causes the processor to begin fetching at the instruction that follows the most recently executed call instruction.
Call and return instructions update architectural state of the system. For example, in a conventional processor such as an x86 architecture processor, a call instruction updates an architectural stack pointer register and updates memory (i.e., pushes a return address onto a stack in memory at the stack pointer value). A return instruction updates the architectural stack pointer register.
Many conventional processors also speculatively execute instructions. That is, when the conventional processor encounters a conditional branch instruction it predicts the branch instruction outcome (i.e., direction and target address) and continues fetching and executing instructions based on the prediction. If a call or return instruction happens to be in the predicted path of instructions, the processor does not update the architectural state associated with the call or return instruction until it is no longer executing speculatively, i.e., until it has resolved all outstanding conditional branches older than the call or return instruction. To accomplish this, a conventional processor sends the call and return instructions down to its execution units and updates the architectural state associated with the call or return instruction only after the execution units have resolved all outstanding conditional branches older than the call or return instruction. Thus, call and return instructions like other instructions, such as conditional branch instructions, flow through the various processor pipeline stages, including the execution and retirement stags, in order to be executed and retired. Consequently, the call and return instructions incur the same latency that other instructions incur in terms of clock cycles. Furthermore, the call and return instructions consume precious resources, for example execution unit slots, register alias table entries, reservation station entries, or reorder buffer entries.
Therefore, what is needed is a microprocessor with an improved technique for allowing programs to call subroutines and return from subroutines.