The present invention relates generally to the field of processors and in particular to a system and method of executing conditional instructions architected to not provide an output if a condition is not satisfied.
Microprocessors perform computational tasks in a wide variety of applications. A common goal in microprocessor design is improved performance, allowing for faster operation and/or increased functionality through software evolution. Many modern processors employ a pipelined architecture, where sequential instructions, each having multiple execution steps, are overlapped in execution. For higher instruction throughput, the instructions should flow continuously through the pipeline. Any situation that causes instructions to stall in the pipeline detrimentally affects instruction throughput and accordingly processor performance.
Instructions operate on data obtained from, and write their results to, memory. Modern processors utilize a hierarchical memory structure comprising a few fast, expensive memory elements, such as registers, at the top level. The memory hierarchy then comprises successively slower but less costly memory technologies at lower levels, such as cache memories (SRAM), solid-state main memory (DRAM), and disks (magnetic or optical media), respectively. For applications, such as portable electronic devices, DRAM is often the lowest level of the memory hierarchy.
Many processor instruction set architectures (ISA) include a set of General Purpose Registers (GPRs), which are architected registers used to pass data between instructions, and to and from memory. Instructions that perform logical and arithmetic operations on data read their operands from, and write their results to, specified GPRs. Similarly, memory access instructions read data to be written to memory from GPRs, and write data read from memory to GPRs. A compiler assigns a source and target GPR identifiers to each instruction, and orders the instructions, such that the proper results are calculated. That is, instructions are arranged in “program order” that guarantees correct results by directing earlier instructions to store results in specified GPRs, and directing later instructions to read those GPRs to obtain operands for further processing.
However, many processors execute instructions “out-of-order”—that is, in other than the instructions' program order. One example of a case where out-of-order execution arises is in superscalar designs, wherein two or more instructions may be executed in parallel, in different execution pipelines. If an instruction stalls in one pipeline, following instructions may be dispatched to a free pipeline for immediate execution. Out-of-order execution is not limited to superscalar designs, and may occur in single-issue designs. In either case, out-of-order execution presents certain problems.
Independent instructions may be executed without regard to original program order. However, many instructions exhibit a dependence on other instructions, known as “hazards.” Data hazards arise when the reordering of instructions (such as that caused by out-of-order instruction issue) would change the order of access to the operand involved in the dependence. Data hazards of concern may be classified into three types. Consider two instructions, i and j, with i occurring before j in program order.
One type of data hazard is a Read after Write (RaW) hazard. This occurs when i writes a target register that is an operand source for j. If j attempts to read the register before i writes it, j improperly retrieves an old value. A Write after Write (WaW) hazard occurs when both instructions i and j write to the same target register, and j attempts to write its target before it is written by i. In this case, the writes are performed in the wrong order, leaving the value written by i in the register rather than the value written by j. A Write after Read (WaR) hazard occurs when j writes to a target register that is an operand source for i, prior to i reading the register. This causes i to incorrectly retrieve the new value, written by j, rather than the correct value, written by a previous instruction. Note that the Read after Read (RaR) case is not a data hazard; reads may be performed in any order.
For example, consider the following representative code fragment:
ANDr2, r10, r12logical AND the contents of r10 to thecontents of r12, place the result in r2STr2, memstore the contents of r2 to memory location memADDr2, r5, r6add contents of r5 and r6, place sum in r2
The ST has a RaW dependency on the AND. This data hazard requires that these instructions be executed in program order. Additionally, the ADD exhibits a WaR dependency on the ST. Semantically, the ADD cannot write its results to r2 until after the ST has completed its read of r2. Otherwise, the ST will write the result of the ADD to memory, when it should write the result of the AND. WaR and WaW data hazards are name dependencies, not true data dependencies, since no data is being passed from one instruction to the next (the only dependency is that one instruction not corrupt the contents of a register that the other instruction will read or has written). Name dependencies may be resolved by a technique known as register renaming.
In a register renaming system, a large set of physical registers, each having a physical register number (PRN), is managed by dynamically assigning logical register numbers (LRNs) to the physical registers. The LRNs may comprise, for example, the logical GPR identifiers (r0, r1, r2, . . . ). Preferably, the number of physical registers is greater than the number of LRNs, or architected GPRs. A Renaming Table (RT) maintains the dynamic mapping between LRNs and PRNs.
Early in the pipeline (e.g., at or following a decode stage), the register access characteristics of an instruction are inspected, and the LRNs (e.g., GPR identifiers) associated with the instruction are translated to PRNs via the RT. For instructions that write a register, a new LRN-to-PRN mapping is entered in the RT, mapping the LRN to an unused PRN, such that the write is directed to an associated physical register (that is, the LRN is “renamed”). Instructions that read a register translate their LRN to a PRN via a RT lookup. The PRN remains associated with the register-reading instruction throughout its tenure through the pipeline.
Register-writing instructions do not “corrupt” prior values written to the same LRN; the write is directed to a new, unused PRN (as the LRN is renamed to a new PRN). Instructions that follow the writing instruction in program order will be directed to the same PRN, to obtain the written value. Instructions preceding the writing instruction in program order were mapped by the RT to a different physical register (prior to the renaming operation), and will continue to access that physical register. Thus, instructions that write a given LRN may be executed ahead of instructions that read a prior value from the LRN (WaR) or write a prior result to the LRN (WaW). In this manner, the WaR and WaW name hazards are avoided.
To allow the processor to recover from an exception, a mispredicted branch, or the like, restrictions are placed on the availability (for further renaming) of physical registers to which data are written. For example, a LRN may be renamed to PRN1, and the result of a first instruction written to the LRN, and hence to PRN1. A second instruction may also write data to the LRN, which is renamed to PRN2, and hence PRN2 stores the second instruction's result. In this case, PRN1 is not free for another LRN to be renamed to it until the second instruction commits (meaning it, and all instructions ahead of it, have been fully exception-checked and are assured of completing execution). In addition, all instructions between the first and second instruction that reference the LRN (that is, all instructions that read PRN1) must have completed a read of PRN1 or otherwise be guaranteed of eventually receiving that value. Only then can PRN1 be released, and made available for another LRN to be renamed to it.
Returning to the example code fragment above, when the AND instruction is decoded, and its write to LRN r2 detected, the LRN r2 is assigned in the RT to a physical register, say PRN x. The result of the AND is thus written to physical register x. When the ST instruction is decoded, its read from LRN r2 is detected, and the RT is accessed. LRN r2 is mapped to PRN x, so the ST instruction will read physical register x (thus obtaining the result written by the AND). When the ADD instruction is decoded, and its write to LRN r2 detected, the LRN r2 is re-assigned—or “renamed”—to a different physical register, say PRN y. Subsequent instructions that read LRN r2 will be directed by the RT to physical register y. Note that the ADD may execute prior to the ST; the ST will still retrieve the correct result from PRN x, thus the WAR hazard is resolved.
One problem with a register renaming system arises from the execution of conditional instructions. Conditional instructions are instructions that are architected to perform an arithmetic or logical operation and write the result, only if a condition is satisfied. Until the condition is evaluated (which often occurs deep in the pipeline), it cannot be determined whether a conditional instruction will write a register. If the condition is not satisfied, the conditional instruction is effectively a NOP, or a non-operation instruction, which does not alter any GPR. Because of the uncertainty whether a conditional instruction will write a register or not, subsequent instructions cannot ascertain whether a dependency on the conditional instruction exists until the condition is evaluated. For example, consider the following code fragment:
CMPr1, r12compare contents of r1 and r12 (set code orflag to reflect the result of the comparison)ANDr2, r10, r12logical AND the contents of r10 to thecontents of r12, place the result in r2SUBEQr2, r7, r8if the previous compare was equal, subtractcontents of r8 from r7 and place result in r2.Otherwise, r2 is not changedSTr2, memstore the contents of r2 to memory location memADDr2, r5, r6add contents of r5 and r6, place sum in r2
In this example, the ST cannot ascertain whether it has a data hazard with respect to the SUBEQ or not, until the EQ condition is evaluated. That is, the ST cannot determine if r2 will be written by the AND or SUBEQ instruction. Semantically, and actually in a processor that always issues instructions in program order, both the AND and SUBEQ are always executed, and the SUBEQ may or may not update the value of the r2 register; the ST doesn't “care” and will simply store the contents of r2. However, in an out-of-order design, the processor must determine whether the ST dependency is on the AND or the SUBEQ.
In particular, in a register renaming system, the processor must stall the pipeline early, at the register renaming stage, until the EQ condition is evaluated and it can be determined whether the SUBEQ will actually write r2. Since the condition is evaluated deep in the pipeline, this incurs undesirable pipeline stall. Alternatively, the RT may speculatively rename r2 to a new PRN for the SUBEQ instruction. In this case, the RT must have a mechanism to undo the renaming, i.e., to restore the mapping of the LRN r2 to the previously named PRN, if the EQ condition is not satisfied. This is necessary since, if the condition is not satisfied, the SUBEQ does not actually write the new PRN, and the RT would be left mapping LRN r2 to a physical register containing undefined data. This additional circuitry adds complexity and increases power consumption in the RT.
One type of instruction known in the art, the output of which is dependent on a condition evaluation, is a conditional select. A conditional select instruction is defined by the instruction set architecture to always (i.e., unconditionally) write a result. Only the value written, not whether an output is written, is dependent on the condition evaluation. For example, a conditional select ADD instruction, ADDSEQ r2, r3, r4, r5 may add the contents of r4 and r5, and place the result in r2, if the EQ condition is met. If the EQ condition is not met, the instruction places the contents of r3 in r2. Note that this instruction architecturally and semantically has an explicit input for the alternative result (in this example, a read of r3). The output then selects between the result of an operation or the alternate result, depending on the condition evaluation.
A conditional instruction is distinct from a conditional select. As used herein, a “conditional instruction” is an instruction that is architected to write the result of an operation to a target if a condition is satisfied, and to not write the target if the condition is not satisfied. That is, if the condition fails, the conditional instruction effectively converts to a NOP, and does not write any register or forward any result in an operand forwarding environment. It is precisely this uncertainty whether a conditional instruction will provide an output or not, that forces the processor to stall the pipeline at the register renaming stage (which generally occurs early in the pipeline) until the condition is evaluated (which often occurs late in the pipeline). Only when it is known whether or not the conditional instruction is a de facto NOP can data dependencies of following instructions be resolved. This type of conditional instruction is common in modern processor ISAs.