Microprocessors perform computational tasks in a wide variety of applications. Improved processor performance is almost always desirable, to allow for faster operation and/or increased functionality through software changes. In many embedded applications, such as portable electronic devices, conserving power is also an important goal in processor design and implementation.
Many modern processors employ a pipelined architecture, where sequential instructions are overlapped in execution to increase overall processor throughput. Maintaining smooth execution through the pipeline helps achieve high performance. Most modern processors also utilize a hierarchical memory, with fast, on-chip cache memories storing local copies of recently accessed data and instructions.
Real-world programs include indirect branch instructions, the actual branching behavior of which is not known until the instruction is actually evaluated deep in the execution pipeline. Most modern processors employ some form of branch prediction, whereby the branching behavior of indirect branch instructions is predicted early in the pipeline, such as during a fetch or decode pipe stage. Utilizing a branch prediction technique, the processor speculatively fetches the target of the indirect branch instruction and redirects the pipeline to begin processing the speculatively fetched instructions. When the actual branch target is determined in a later pipe stage such as an execution pipe stage, if the branch was mispredicted, the speculatively fetched instructions must be flushed from the pipeline, and new instructions fetched from the correct target address. Prefetching instructions in response to an erroneous branch target prediction adversely impacts processor performance and power consumption.
One example of indirect branch instructions includes branch instructions utilized to return from a subroutine. For example, a return call from a subroutine may include a branch instruction whose return address is defined by the contents of a register. A return address defines the next instruction to be fetched after the subroutine completes and is commonly the instruction after a branch instruction from which the subroutine was originally called. Many high-performance architectures designate a particular general purpose register for use in subroutine returns, commonly referred to as a link register.
For convenience, a return call may also be referred to as a branch return instruction. In order for a processor pipeline to utilize branch prediction for a branch return instruction, conventional software includes an explicit subroutine call such as a branch and link instruction to record the return address into the link register. Many high performance implementations include a link stack structure at the decode stage of processing the branch and link instruction. Link return values are pushed onto this stack, in order to allow for accurate branch prediction when the corresponding subroutines return. Conventional link stack structures contain a list of return addresses in order to support multiple subroutine calls flowing through a pipeline and to support the nesting of multiple levels of subroutine calls. Subsequently, when the branch return instruction within the subroutine is being decoded, the return address is read from the link stack structure to be utilized in branch prediction to predict the target address if other branch prediction hardware dictates that the processor should redirect the pipeline. If the predicted result indicates to redirect the pipeline, the pipeline begins fetching instructions from the return address that was read from the link stack structure.
However, there exists many compilers and legacy code which do not generate or incorporate conventional branch and link instructions when calling a subroutine. Therefore, in those situations, the link stack structure is not utilized resulting in the integrity of the link stack structure to be compromised. For example, the conventional popping of a return address from the link stack structure may not correlate to the return instruction which stimulated the popping of the return address in the first place. One effect of a compromised link stack structure includes increased mispredictions on return instructions. Furthermore, in those situations where a subroutine call is not recognized in a program segment, the problem is compounded because branch prediction hardware may not be utilized to populate the link stack structure on subsequent unrecognizable subroutine calls. By way of example, refer to the following table containing a code segment which would run on an ARM Ltd. compatible processor:
TABLE 1Code Segment.0x00899808LDR LR, 0x008998180x0089980CADD0x00899810SUB0x00899814BR 0x009900000x00899818INSTRA0x0089981CINSTRB. . .0x00990000ADD0x00990004SUB0x00990008MOV0x0099000CBX LR
The program flow of the code segment in Table 1 includes processing the instructions in sequential order starting at address 0x00899808 and through to address 0x00899814. At address 0x00899814, a branch instruction changes the program flow so that the next instruction processed is located at address 0x00990000, the start of a subroutine.
The combination of setting the link register (i.e. LDR LR, 0-00899818) and the branch instruction (i.e. BR) prepare the processor for a subsequent branch to a subroutine. In this example, the actual subroutine to which the call is made begins at address 0x00990000 and ends at address 0x0099000C. The LDR LR, 0x00899818 instruction indicates that address 0x00899818 should be copied into a link register (LR) resulting in storing the return address, address 0x00899818, into the link register. At the end of the subroutine, the return address is retrieved from the link register. More specifically, the return address is retrieved when executing BX LR, the branch return instruction. Other code segments which imply a subroutine call exist and include instructions which modify the link register such as the sequential combination of instructions MOV LR, PC BR [A] where [A] is the address of the beginning of a subroutine.