1. Field of the Invention
The present invention relates generally to data processing systems, processors, and computer implemented methods. More specifically, the present invention relates to data processing systems, processors, and computer implemented methods for maintaining the integrity of a link stack in response to a misprediction or flush.
2. Description of the Related Art
Modern high frequency microprocessors are typically deeply pipelined devices. For efficient instruction execution in such processors, instructions are often fetched and executed speculatively. An instruction may be fetched many cycles before it is executed. Since branch instructions may cause instruction fetching to start from a non-sequential location, the direction and target of a branch instruction is predicted when the branch is fetched so that instruction fetching can proceed from the most likely address. The prediction is compared with the actual direction and target of the branch instruction when the instruction is executed. If it is determined that the target or direction of the branch has been mispredicted, then the branch instruction is completed and all instructions fetched after the branch are flushed out of the instruction pipeline. New instructions are then fetched either from the sequential path of the branch if the branch is resolved as not taken, or from the target path of the branch if the branch is resolved as taken.
Often there are a number of branches, i.e., subroutine calls and returns, between the instructions that are being fetched and the instructions that are being executed in the processor execution units. Therefore, to handle subroutine calls and returns efficiently, many high frequency microprocessors employ a link stack. On a subroutine call, the address of the following instruction is “pushed” into the stack. On a subroutine return, the contents at the top of the stack, which are expected to contain the address of the instruction following the original subroutine call, are “popped” from the stack. Since “pushing” and “popping” from a hardware stack can normally be done when the branch is fetched, which is several cycles before the corresponding branches are executed in a deeply pipelined processor, such a linked stack mechanism helps implement the instruction fetching scheme across subroutine calls and returns to a great extent. Notwithstanding, the link stack can become corrupted during the process of speculative instruction fetching and execution.
A link stack ideally enables the fetch logic to determine the target of a branch-to-link “bclr” instruction without the typical latency required to process the previous branch-and-link “bl” instruction, to update to the architected link register, and retrieve the most current value. A branch-and-link instruction is used in a subroutine call where the processor branches to instructions in the subroutine and the return address is the next instruction after the subroutine call. The return address is stored in or “pushed onto” a link stack. When the processor gets to the end of the subroutine, a branch-to-link instruction branches back to the previously stored return address in the link stack. In this case, the return address is retrieved from or “popped from” the link stack.
A link stack exploits the common programming paradigm that branch-to-link operations generally will branch to the address saved by the most recent branch-and-link instruction. Although this link stack is not required for correct machine operation, it serves as a fetch accelerator by buffering a last-in/first-out (LIFO) stack history of the most recent branch-and-link return addresses and making the most recently added value available to the Instruction Fetch Address Register (IFAR) many cycles before it is available thorough normal link register write and read mechanisms. The link stack therefore provides a speculative address for the IFAR to enable lower latency fetches down the expected path of execution.
Accurate maintenance of the link stack proves to be complex as the rate of instruction fetch increases. With aggressive speculation, the link stack is required to accept “speculative pushes,” wherein each branch-and-link is interpreted as a link stack “push” operation, and “speculative pops,” wherein branch-to-link instructions function as a link stack “pop.” Branch prediction provides a guess as to the direction of each branch, i.e., whether a branch was “taken” or “not taken.” Until the actual direction of a conditional branch is resolved, the branch, successive instructions, and the link stack operation are considered speculative. If no branches are mispredicted, this speculation has no adverse effect on the state of the link stack. However, when a branch is determined to be mispredicted, the incorrect state of the machine must be eliminated, including the entries speculatively added to or removed from the link stack.
The link stack is also susceptible to corruption by flushes from other units, unrelated to branch misprediction. As the processor speculatively executes, several correctly predicted branches could be in flight when a flush occurs that requires the fetch logic to back up and re-execute instructions. This requires the link stack to be restored to its state at the time of the original fetch to that instruction.
Traditional approaches to link stack management are insufficient for superscalar processors that employ aggressive fetch and branch speculation. A simple stack mechanism maintains a top-of-stack pointer which is used to read on pop operations of the top-of-stack pointer followed by a decrement, and to write on push operations of the top-of-stack pointer preceded by an increment. In a traditional stack, consider the following sequence:
1) Push A;
2) Pop;
3) Push B.
The traditional stack is quickly corrupted where all three operations are speculative when initiated, and operation 2, a “pop” of instruction A, is later determined to be a misdirected branch whose results must be erased. The simple stack would have removed instruction A as a result of operation 2. Operation 3 would overwrite the instruction A entry, thus destroying the state prior to operation 2, which makes recovery impossible.
Previous approaches describe methods of utilizing the use of physical registers as temporary space for general registers. This is a useful technique for a link register stack, but has no underlying control mechanism to manage the speculation and recovery required for link stack operations.