1. Field of the Invention
The invention generally relates to the field of computers and, more particularly, to computer architecture.
2. Description of the Related Art
Processor instruction set architectures typically implement a variety of control transfer instructions (CTIs) including conditional and unconditional branches, calls, jumps, conditional traps, etc. In pipelined processor implementations, the execution of control transfer instructions can result in certain inefficiencies because instructions that follow a branch or other control transfer (in an expected or predicted execution sequence) may need to be flushed from the pipeline if the actual execution path diverges from that expected or predicted. In such cases, instructions along the actual execution path of the branch need to enter the pipeline for processing. The resulting pipeline bubble results in unproductive processor cycles.
One architectural technique that has been employed to reduce this inefficiency is to delay the effect of the control transfer instruction and to treat an instruction that immediately follows a control transfer instruction as if it logically preceded the delayed control transfer instruction. Instructions that are so treated are often said to reside in the “delay slot” of a “delayed control transfer instruction.” In this way, the size of the bubble is reduced (though not necessarily eliminated) and at least some of the otherwise wasted pipeline stages and processing cycles may be used productively.
SPARC instruction set processors traditionally implement an instruction set architecture that contemplates delay slot instructions. SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the United States and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.
As a specific example, the SPARC® Version 9 ISA includes five basic control transfer instruction types: a conditional branch, a call and link (CALL), a jump and link (JPML), a return from trap (RETT) and a trap. In the SPARC Version 9 ISA, a delayed control transfer instruction such as a branch, when taken, causes the processor to change control to an instruction at a target address after a one instruction delay. In the usual case, the delay slot instruction (i.e., the instruction following the control transfer instruction) is executed after the control transfer instruction is executed and before control actually transfers to the target of the control transfer instruction.
If the instruction in the delay slot of a DCTI is itself a DCTI, then processing can be more complicated and may be subject to special rules imposed by an instruction set architecture. For example, in SPARC-based architectures a pair of successive DCTIs (i.e., a DCTI couple) is handled as follows. Both control transfer instructions are executed (but not the instruction in the delay slot of the second DCTI) and, assuming that both branches are both taken, control is transferred briefly to the destination of the first and then immediately to the destination of the second. The goal of such a special rule is to simplify processing of what could otherwise be a very complex hierarchy of branch conditions and targets. Other simplifying rules may be employed in other architectures.
In general, when a DCTI couple stradles a cache line boundary, a relatively complex pipeline can be required to ensure a desired instruction fetch behavior. Moreover, even when DCTI couples are encountered that do not cross cache line boundaries, relatively complex processing may be employed to conform processor behavior with special rules imposed by an instruction set architecture.
Due to the increasing operating frequencies of pipelined processors and increased depth of pipelines and speculation characteristic of some modern processor architectures, it can be difficult to design a processor pipeline front-end that can handle (in a timely manner) operations in support of delayed control transfer instructions. In some cases, instruction fetch bandwidth may be adversely affected and pipeline stalls may result. Unfortunately, proper execution of delayed control transfer instructions (including DCTI couples) may be required for instruction set compatibility. In particular, legacy code may exploit DCTI and delay slot instruction code constructs. As a result, it may not be acceptable to alter long-standing instruction set behaviors and conventions, even if such behaviors and conventions tend to limit performance.
What is needed are techniques for reducing the complexity of a processor pipeline front-end while still supporting DCTIs.