Many performance considerations go into the design of a modern microprocessor system. One such design consideration deals with handling returns from subroutines. How a microprocessor handles returns from subroutines is important to performance because subroutine calls and returns are frequently occurring events. Therefore, the more efficiently subroutine returns are handled, the greater the overall performance achieved from a data processing system.
Subroutine calls can be supported with the use of a link register (LR) and a branch-and-link instruction. The branch-and-link instruction will be referred to as the "branch-and-link". The branch-and-link is defined to be an instruction which branches to an address specified by a generic subroutine call while saving the address of the instruction after the branch-and-link into the link register. Note that the branch-and-link will overwrite the existing value in the link register. Subroutine returns can be supported by a branch-to-link instruction (which from now on will be referred to as "branch-to-link"). The branch-to-link is defined to be an instruction which uses the value in the link register as the next instruction fetch address.
Because the link register is overwritten whenever a branch-and-link is executed, a means of saving off the current value of the link register is provided by a "move from link register" type instruction, which from now on will be referred to as MFLR. A MFLR copies the current value of the link register into a general purpose register (GPR). Note that after the current value of the link register is copied to the GPR, the value can be stored out to memory using a store instruction. It should be noted that the terms MFLR, GPR, as well as other specific references throughout the present application are used to describe generic elements or events, and are not meant to limit the present invention to any particular technology type.
An explicit means for moving a value into the link register is provided by a "move to link register" instruction (which from now on will be referred to as MTLR) which copies a specified GPR value into the link register. Note that unlike a branch-and-link which branches and updates the link register, the MTLR only updates the link register without having to execute a branch.
A typical instruction sequence for a series of subroutine calls and returns is shown in FIG. 1. Specifically, the instructions are listed along with corresponding addresses and labels. In this figure, instructions beginning at the label HERE1 can be envisioned as part of a subroutine. A call to the subroutine starting at HERE1 is made at address A via the branch-and-link instruction BL. As a result of this branch-and-link, the address A+4 is saved into the link register. Note, 4 is the instruction word length. The program beginning at HERE1 saves the link register value to memory, via the MFLR and store instruction, and continues executing instructions.
Within the HERE1 subroutine, another subroutine call is made via a branch-and-link to HERE2. This branch-and-link causes the value B+4 to be stored into the link register and branch to HERE2. In the program beginning at label HERE2, the link register is not saved because, in this example, no other subroutine calls are going to be made and therefore the current link register value can be used for the return. The return from HERE2 back to HERE subroutine is accomplished with a branch-to-link instruction which causes a branch back to the address specified by the valu in the link register (B+4, where 4 is the instruction size). The return from the HERE1 subroutine is made by loading the saved link register value (A+4) back from memory into a GPR (G3 in the example), performing a MTLR which puts the value in G3 (A+4) in the link register, and then performing a branch-to-link.
A problem occurs if, when wanting to return from a subroutine using the branch-to-link, the link register value has not yet been updated with the value from memory. This delay in updating the LR is typical due to long memory reference latencies compared to the clock speeds of modern microprocessors. While waiting for the link register value to be updated, the next instruction fetch address is not known, no new instructions are fetched, and therefore, the processor may not be able to do useful work degrading performance.
Prior art helps to reduce this performance degradation with the use of a link stack (LS). The LS provides a means for predicting the link register value to be used as the next instruction fetch address. The link stack is discussed in terms of having a "current location". The current location references the next fetch address to be predicted by the link stack. The current location may be stored in a circular buffer having a top pointer and a bottom pointer, in a buffer having a fixed bottom and a point referencing the top current location, or the link stack may consist of Last-In-First-Out (LIFO) buffer wherein each data value is shifted toward a current location as the current location is retrieved.
The current location is updated with the same value as the link register whenever a branch-and-link is executed. When the current location is updated, the previous current locations are pushed one entry deeper into the stack. When a branch-to-link is executed and the link register is not available, the value of the current location of the link stack is predicted to be the next instruction fetch address and is used as such. When this occurs, the current location is popped from the stack and previous current entries are moved up one entry which causes the next entry on the stack to become the current location. It would be understood by one skilled in the art that the link stack has a finite depth (eg. 8 locations).
Looking again at FIG. 1, when the branch-and-link at A is executed, the current location of the link stack (along with the link register) would be updated with A+4. When the branch-and-link at B is executed, the current entry of the link stack would be updated with B+4 and the previous current entry, the entry containing A+4, would be pushed one entry deeper into the stack. Upon executing the branch-to-link at C, the process of returning from subroutine HERE2 would cause the entry containing B+4 to be pulled off the stack and the entry containing A+4 becomes the current entry. Upon the returning from HERE1 the branch-to-link at B+x would cause the entry containing A+4 to be popped off the stack. It is important to not(that if the link register value was not available when executing the branch-to-link at B+x, because the load instruction had not yet brought in the value, the A+4 value popped from the link stack would be predicted to be the next instruction fetch address. In this particular example, the prediction would be correct and the processor will be able to accomplish useful work while waiting for the load to finish.
The use of link stacks to predict branches in this fashion provides the ability to speculatively continue execution until the link register is updated and the prediction can be verified. If the prediction was correct, the program flow continues normally. Otherwise, in the event an inaccurate prediction has been made, the speculative instruction path will be discarded and a branch to the actual link address occurs.
The use of a link stack works well until a return from a subroutine directs flow to a routine from which it was not called or a branch-to-link is used as a jump rather than as a subroutine return. For example, FIG. 2 illustrates an initial program flow from a routine A into a subroutine B, from within subroutine B to subroutine C, and likewise through subroutine G. The prior art works well if the return flow from G proceeds back to F. However when a return from subroutine G does not return to its calling routine F, the sequence returns are out of order. The return from subroutine G to subroutine E represents such an out of order sequence. In this case, the link stack directs prediction to return to subroutine F which will be incorrect.
The prior art link stack (not shown) is also no longer advantageous when a branch-to-link instruction is used as a jump rather than as a subroutine call. This happens when a program performs a branch-to-link to implement a jump to another part of the subroutine. In this case, the address provided by the link stack for the prediction will be incorrect. Upon detection of the misprediction, the entire link stack is invalidated requiring all information in the stack to be lost. Therefore, a data processor capable of overcoming the prior art problem of losing all link stack information upon a misprediction would be desirable.