1. Field of the Invention
The present invention generally relates to microprocessors, and particularly relates to managing hardware return stacks used by some types of microprocessors for accelerating returns from procedure calls.
2. Relevant Background
As microprocessors are deployed in an ever-increasing array of applications that require sophisticated functionality, increasing the microprocessor's execution speed is desirable. Additionally, in embedded applications such as portable electronic devices with limited battery power, decreasing the microprocessor's power consumption is desirable. Simply increasing a microprocessor's clock speed, however, may not yield the desired increase in system performance because various input/output bottlenecks impose constraints on the microprocessor's real-world performance. For example, off-chip memory accesses generally run slower than on-chip memory accesses, leading to the use of instruction and data caching techniques. Reduced Instruction Set Computers (RISC) generally issue one or more instructions per clock cycle, and often use instruction caching to enhance performance. Pipelined RISC processors can issue multiple instructions per clock cycle and typically make heavy use of data and instruction caching.
Instruction caching (“pre-fetching”) predicts future instructions and brings them into an on-chip instruction cache in advance of the microprocessor executing them. Pre-fetching can eliminate much of the delay associated with slower off-chip instruction memory, when the correct instructions are pre-fetched. Most instructions execute sequentially, and can be pre-fetched with confidence. Conditional branch instructions may “take” a branch or not, depending on a branch condition that is typically only evaluated deep in the pipeline. To avoid the delay in waiting for this evaluation, the behavior of branch instructions is often predicted early in the pipeline, and instructions are pre-fetched from the predicted branch target address. Instruction pre-fetching methods include both static and dynamic instruction pre-fetching.
Dynamic instruction pre-fetching relies on instruction execution history, and may involve tracking the accuracy of previous taken or not-taken predictions for a given number of the most recent conditional branch instructions, for example. Static pre-fetching generally does not rely on execution history, and may be used, for example, when encountering conditional branches for the first time. One type of branch instruction for which static pre-fetching offers performance advantages is the return instruction from a called procedure, wherein the procedure's return address is predicted to support pre-fetching of instructions beginning at that predicted return address.
A “return stack” can be used to support static prediction of return addresses for procedure call return instructions. A typical return stack comprises a multi-level buffer. When a procedure call instruction is predicted or recognized, a corresponding return address can be taken from the execution stage of the microprocessor's instruction pipeline and pushed onto the return stack. Conversely, when a procedure return instruction is predicted or recognized, the return address currently at the top of the return stack is popped from the stack and used as the predicted return address for instruction pre-fetching.
Thus, in a conventional approach to managing a return stack, corresponding predicted return addresses are sequentially pushed onto the return stack as procedure calls are encountered. Conversely, return addresses are sequentially popped from the stack as procedure return instructions are encountered. This conventional approach incorrectly predicts procedure return addresses in multi-level procedure calls, wherein successive procedure calls are “chained” together in that the return instruction of each succeeding procedure call in the chain points back to the return instruction of the preceding procedure call.
Optimally, the return address that should be predicted for the return instruction of the last procedure in the chain is the return address corresponding to the first procedure call in the chain. However, since the successive procedure calls result in sequentially pushing the return address of each nested procedure call onto the return stack, the return address popped for the return instruction of the last procedure call is that of the immediately prior calling procedure in the chain. If pre-fetching continues from that address, the next instruction fetched will be another return, which will again pop the return stack. Successively popping the return stack in this manner needlessly decreases processor performance and wastes power.