Modern high frequency microprocessors are typically deeply pipelined devices. For efficient instruction execution in such processors, instructions are often fetched and executed speculatively. An instruction may be fetched many cycles before it is executed. Since branch instructions may cause instruction fetching to start from a non-sequential location, the direction and target of a branch instruction is predicted when the branch is fetched so that instruction fetching can proceed from the most likely address. The prediction is compared with the actual direction and target of the branch instruction when the instruction is executed. If it is determined that the branch has been mispredicted (either its target or its direction), then the branch instruction is completed and all instructions fetched after the branch are flushed out of the instruction pipeline and new instructions are fetched either from the sequential path of the branch (if the branch is resolved as not taken) or from the target path of the branch (if the branch is resolved as taken).
Often there are a significant number of branches (i.e., subroutine calls and returns) between the instructions that are being fetched and the instructions that are being executed in the device execution units. Therefore, to handle subroutine calls and returns efficiently, many high frequency microprocessors employ a link stack. On a subroutine call, the address of the following instruction is “pushed” into the stack while on a subroutine return, the entry at the top of the stack (which is expected to contain the address of the instruction following the original subroutine call) is “popped” from the stack. Since pushing and popping from a hardware stack can normally be done much faster and several cycles before the corresponding branches are executed in a deeply pipelined processor, such a link stack mechanism helps implement efficient instruction fetching across subroutine calls and returns to a great extent. Notwithstanding, the link stack can become corrupted during the process of speculative instruction fetching and execution.
Consider, for example, the case where a subroutine call is performed using a “branch and link instruction” and a return from subroutine is achieved using a “branch to link register” or “bclr” instruction. It may happen that a “bclr” instruction, which for example returns to a location “A”, is fetched speculatively followed by a speculative fetch of a “branch and link” instruction, for example from call-site B. The link stack is updated at fetch time, such that after these instructions are fetched, the address location “A” is replaced by the address location “B+4” (each instruction consists of four bytes) at the top of the link stack. Since both the “bclr” and “branch and link” instructions are speculatively fetched, they may not ultimately be in the execution path. If these instructions are not in fact in the execution path, (in which case the instructions are flushed out), the link stack becomes corrupted.
Generally, any time one or more “bclr” instruction is followed by one or more “branch and link” instructions in the speculated path, the link stack becomes corrupted if the speculation turns out to be wrong. For a commercial programming workload, about 2% of the instructions are “bclr” instructions and therefore it becomes very important to be able to predict the target address for these instructions with a good degree of accuracy in deeply pipelined machines. Thus, the need has arisen for circuits, systems and methods for recovering a link stack from mis-speculation.