The present invention relates generally to the field of processors and in particular to a system and method for repairing a link stack corrupted by speculative instruction execution.
Modern software is modular in nature, with specific functions being implemented in subroutines. To efficiently implement subroutine calls and returns, many modern processors employ circuits that implement (or emulate) a link stack. A link stack is a logical structure for storing link addresses that is visible only to the hardware and not directly accessible to the programmer. An instruction that calls (branches to) a subroutine, such as a branch and link instruction, “pushes” the address of the following instruction onto the link stack. Upon encountering a return-type instruction in the subroutine, the link stack is “popped” to yield the address of the instruction following the one that made the subroutine call. As subroutines call other subroutines, link addresses are successively pushed onto the link stack, and popped as the subroutines complete execution and return.
Most modern processors employ a pipelined architecture, where sequential instructions, each having multiple execution steps, are overlapped in execution. For maximum performance, the instructions should flow continuously through the pipeline. Any situation that causes instructions to stall in the pipeline detrimentally affects performance.
Virtually all real-world programs include conditional branch instructions, the actual branching behavior of which is not known until the instruction is evaluated deep in the pipeline. To avoid pipeline stalls that would result from waiting for actual evaluation of each branch instruction, many modern processors employ some form of branch prediction, whereby the branching behavior of conditional branch instructions is predicted early in the pipeline. Based on the predicted branch evaluation, the processor speculatively fetches and executes instructions from a predicted address—either the branch target address (if the branch is predicted taken) or the next sequential address after the branch instruction (if the branch is predicted not taken). When the actual branch behavior is determined, if the branch was mispredicted, the speculatively fetched instructions are flushed from the pipeline, and new instructions are fetched from the correct next address. Mispredicted branches adversely impact both performance and power consumption.
Another consequence of mispredicted branches may be corruption of the link stack. If speculatively executed instructions following a mispredicted branch include a subroutine return and subsequent call, a valid link address will be popped from the link stack and a new link address pushed onto the stack. Depending on the link stack hardware implementation, erroneously popping the link stack may not itself have adverse consequences, as popping the stack merely moves a read pointer; the data remain in the link stack buffer. Subsequently erroneously pushing a new value onto the link stack, however, may overwrite the previous value. When the branch misprediction is detected and the proper instruction stream is fetched and executed, a subroutine return will transfer control to the wrong location if the link stack corruption is not detected and repaired.
One way to avoid link stack corruption is to disallow link stack updates by speculative instructions. For example, link stack updates may be deferred until all conditional branches are resolved. This would effectively move link stack updates deep into the pipeline, as branch evaluation occurs in execute pipe stages deep in the pipeline. However, this would detrimentally affect performance for short subroutines by effectively denying them the use of the link stack functionality. Accordingly, to gain maximum performance from the link stack hardware, the link stack is preferably updated early in the pipeline, such as at a decode pipe stage.
One known approach to guarding against corruption of processor resources (such as register renaming buffers and the like) due to branch mispredictions is to maintain a parallel, “committed” copy of the resource. The committed copy is only updated when instructions that alter its state commit for execution. An instruction confirms its own execution when it is ascertained that no hazards exist that would preclude the instruction from completing execution. For example, an instruction that implements an arithmetic or logical operation may confirm execution when all of its operands are available (that is, they have been calculated by other instructions or have been successfully retrieved from memory). An instruction commits for execution when it, and all instructions ahead of it in the pipeline, are confirmed.
To avoid corruption due to branch mispredictions, the state of a working copy of a processor resource is altered on an ongoing basis during the routine execution of instructions; however, the state of a committed copy of the processor resource is only altered by instructions that have committed for execution. When a branch misprediction is detected, the committed copy of the processor resource is copied over to, and replaces the contents of, the working copy. This technique places the working copy in a state it had prior to the speculative execution of any instruction.
It is possible to employ this approach to the link stack corruption problem. A working link stack would be updated by instructions in the pipeline implementing subroutine calls and returns. A committed link stack would only be updated by subroutine call and return instructions that have committed for execution. Upon discovering a branch misprediction, the committed copy of the link stack would simply be copied over to be working link stack. However, this approach is costly in terms of both silicon area and power consumption. Replicating the link stack requires duplicating the registers or other memory structures that implement it, along with the control logic necessary to manage the link stacks. The extra hardware occupies valuable integrated circuit area, increases wiring congestion, and complicates clock and power distribution. Continuously updating two complete link stacks consumes, nominally, twice the power of running only one link stack. Particularly in processors deployed in mobile electronic devices, minimizing power consumption is critical to preserve battery life and reduce heat dissipation.