1. Field of the Invention
The invention relates to the field of microprocessors, and more particularly to instructions for evaluating conditions and moving data.
2. Art Background
Computer programs are comprised of multiple instructions which are decoded and executed by the microprocessor. When the program is "run", the microprocessor begins executing the instructions in the program in the sequence in which they appear. Not all of the instructions in the program are typically executed when the program is run. Rather, some instructions are conditional, that is, whether or not they are executed depends upon the outcome of a previous instruction in the program. That is, the microprocessor may skip the execution of some instructions if certain conditions are not met while the program is running.
Table 1a shows both a set of high-level instructions and a set of assembly-level instructions for implementing a conditional statement that moves data to a destination location inside a computer system. The instructions on the left side of the table are from a high-level programming language, in this example C language. The instructions are stored in a machine readable medium, such as computer memory. A high-level language compiler, such as a C language compiler, is used to compile the high-level instructions into assembly instructions compatible with the particular computer system on which the instructions are targeted to execute. After compilation, each high-level instruction typically corresponds to one or more assembly-level instructions, although it is possible for one assembly instruction to correspond to multiple high-level instructions. This process is well known in the art of microprocessors. In Table 1a, letters represent program variables which may be kept in registers within the microprocessor or in memory.
TABLE 1a ______________________________________ High-Level Instructions Assembly-Level Instructions ______________________________________ IF (Y&gt;5) 100 CMP R1=(Y&gt;5) A=2 110 CJMP FALSE R1, J1 ELSE 120 MOV A, 2 A=9 130 JMP J2 140 J1: MOV A, 9 150 J2: ______________________________________
As shown by the assembly-level instructions in Table 1a, a compare instruction 100 is executed to determine if the condition y&gt;5 is TRUE. Following the compare instruction, a conditional jump instruction 110 is executed. If the condition y&gt;5 evaluates to a value of logical TRUE, the conditional jump instruction 110 does not cause a jump, and a MOV instruction 120 is executed. The MOV instruction 120 moves the value 2 to the variable A. Upon completion of MOV instruction 120, a jump instruction 130 is executed to jump over a MOV instruction 140 to an address 150. If the condition Y&gt;5 does not evaluate to logical TRUE, the conditional jump instruction 110 causes a jump to the MOV instruction 140. The MOV instruction 140 moves the value 9 to the variable A. Note that the implementation of what defines "TRUE" in a computer system will vary among computer systems, but typically involves the setting of a one or more bits in a register.
TABLE 1b ______________________________________ High-Level Instructions Assembly-Level Instructions ______________________________________ IF (Y&gt;5) 155 CMP R1=(Y&gt;5) A=2 156 CMOV R1, A=2 ELSE 157 CMOV !R1, A=9 A=9 ______________________________________
Table 1B shows both a set of high-level instructions and a set of assembly-level instructions for implementing a conditional move in a computer system. In Table 1b, as in Table 1a, letters represent program variables which may be kept in registers within the microprocessor or in memory. As shown by the assembly-level instructions in Table 1b, a compare instruction 155 is executed to determine if the condition y&gt;5 is TRUE. Following the compare instruction, a conditional move instruction 156 is executed. If the condition y&gt;5 evaluates to a value of logical TRUE, the conditional move instruction 156 causes the value 2 to be moved to the variable A. Upon completion of conditional move instruction 156, a conditional move instruction 157 is executed. Conditional move instruction 157 evaluates the complement of the value in register R1. If the condition Y&gt;5 does not evaluate to logical TRUE, the conditional move instruction 157 causes the value 9 to be moved to the variable A.
In a pipelined computer, multiple functional units simultaneously execute multiple instructions from a computer program, leading to substantial performance increases. Each instruction in a typical pipelined computer goes through four stages: fetch, decode, execute, and writeback. In the fetch stage, the processor fetches an instruction from an instruction memory, which can be a cache memory, main RAM, flash memory, or other type of machine-readable medium. During the decode stage, the instruction is partially or fully decoded to determine the type of execution unit required to complete the instruction, and the instruction's operands are read from registers. During the execution stage, simple integer arithmetic and logic operations generally complete in one clock cycle on many pipelined processors. During the writeback stage the results of the instruction's execution are written back to registers or are written to other places in computer memory, such as cache memory or main RAM.
A pipelined computer operates most efficiently when the instructions are executed in the sequence in which they appear in memory. Unfortunately, conditional statements and the branch instructions they produce constitute a large portion of the executed instructions in a computer program. When a branch instruction is executed, execution continues either with the next sequential instruction, or with an instruction at a specified "target" address. A branch instruction is said to be "Taken" if execution continues at the target address, or "Not Taken" if execution continues with the next sequential instruction in memory. For example, in Table 1a the target of the branch instruction 130 is the address J2 150.
A branch instruction is either unconditional, meaning the branch is taken every time the instruction is executed, or conditional, meaning the branch is taken or not depending upon a condition. The instructions to execute following a conditional branch are not known with certainty until the condition upon which the branch depends is resolved. Prefetching and executing the instructions at the target address of the branch can lead to a significant performance hit when the branch is "Not Taken". Branches may also be forward, meaning the target address is greater than the address of the branch, or backward, meaning the target address is less than the address of the branch.
Conditional branch instructions such as the instruction 110 in Table 1a typically use a condition field. The value of this field is typically set by the compare instruction and indicates whether the condition evaluated by the compare instruction evaluated to TRUE or FALSE. Subsequent instructions examine the field to determine whether the condition evaluated TRUE or FALSE. For example, a subsequent jump instruction (e.g. CJMP instruction 110 in Table 1a) might examine the condition field and perform a jump only if the bits of the condition field are set. Otherwise, program execution continues with the next instruction after the jump instruction in the execution sequence (e.g. MOV instruction 120 in Table 1a). Note that the computer architecture may be defined to store the condition field in either a general purpose register or in a special purpose register such as a status register or flags register. The architecture may also be defined to incorporate the function of the compare instruction and the branch instruction into a single compare-and-branch instruction which does not make reference to any condition field. The present invention applies equally well to these alternative embodiments.
As stated above, branch instructions may cause a non-sequential change in the fetching of instructions, i.e. the next instruction to be executed may not sequentially follow the previously executed instruction. The direction of a branch instruction is not known until the branch is executed which typically occurs in the execute stage in a pipelined processor. While waiting for the branch instruction to reach the execute stage the intervening stages between the fetch stage and the execute stage are filled based upon a prediction as to the direction of the branch instruction. When branch direction is not correctly predicted, any fetched and partially executed instructions resulting from the incorrect prediction must be flushed from the pipeline. Even an incorrect prediction rate of 5 percent results in a substantial loss in performance due to the number of instructions incorrectly fetched/partially executed in reliance on the wrong prediction. Further delays are incurred while the processor fetches the correct instructions to execute following the branch. It is therefor highly desirable in pipelined computers to eliminate as many branch instructions as possible from the instructions of a computer program.
Another phenomenon which can degrade the performance of pipelined computers is data dependencies. A data dependency exists between two instructions if the execution of one depends upon the results of executing the other. Each instruction has the potential to stall later instructions that depend on it. The following table shows how data dependencies can cause pipeline stalls.
TABLE 1c ______________________________________ 160 ADD r1, r2, r3 161 LOAD [r3], r6 162 AND r5,r8,r7 163 ADD r6,r7,r1 164 SUB r3,r4,r5 ______________________________________
The instructions in Table 1c have an opcode, followed by two or three registers. Instructions that use three registers have two source registers followed by a destination register. An instruction using two registers has a source register followed by a destination register. When a register is enclosed in brackets, then the register is used to address memory. For example, the instruction LOAD [r3], r6 161 loads register r6 with the contents of the memory address in register r3.
In Table 1c, the ADD instruction 163 has a dependency on the preceding LOAD instruction 161. The dependency is on the contents of register r6, which the LOAD instruction 161 fills with the contents of the memory address specified in r3. In this example, the LOAD instruction 161 has not completed execution by the time the ADD instruction 163 is ready to execute, because the LOAD instruction 161 takes, for example, four clock cycles to complete. The ADD instruction 163 is ready to execute only two clock cycles after the LOAD instruction 161 begins execution. The ADD instruction 163 will stall execution in the pipeline for two clock cycles, until the LOAD instruction 161 updates the contents of register r6 so that the ADD instruction 163 can add the contents of register r6 with the contents of register r7. Note that the SUB instruction 164 following the ADD instruction 163 does not depend on the completion of the LOAD instruction 161, nor on the completion of the ADD instruction 163. Therefor, the SUB instruction 164 can be executed while waiting for the LOAD instruction 161 to complete.
Out-of-order (OOO) execution can be used to lessen or eliminate the effect of stalls due to instruction dependencies. Upon encountering an instruction that depends on an instruction still in execution, the OOO execution processor checks for later independent instructions in the program and executes these later instructions before the dependent instruction (such as in the case of the SUB instruction 164 in Table 1c). This reduces the impact of execution stalls because the execution of later independent instructions is overlapped with the execution of instructions requiring multiple clocks to complete.
There are three types of data dependencies which can occur in computer programs: read-after-write (RAW) dependencies, write-after-write (WAW) dependencies, and write-after-read (WAR) dependencies. RAW dependencies occur when an instruction requires the result of a previous instruction. WAW dependencies occur when two instructions write the same register and therefore the writes must occur in the order specified by the program to guarantee subsequent instructions receive the correct value. WAR dependencies occur when an instruction writes the same register that was read by a previous instruction, and therefore the write must occur after the read to guarantee that the correct value is read. Table 1d illustrates the three types of data dependencies.
TABLE 1d ______________________________________ 180 ADD r1, r2, r3 190 SUB r3, r4, r5 192 SHR r6, r7, r4 194 OR r8, r9, r3 ______________________________________
In Table 1d, the ADD instruction 180 writes R3 with the sum of the values in R1 and R2. Register R3 is subsequently read by the SUB instruction 190. A RAW dependency exists that prevents the SUB instruction 190 from being executed prior to the ADD instruction 180. The SUB instruction 190 reads register R4 which is subsequently written by the SHR instruction 192. A WAR dependency exists that prevents the SHR instruction 192 from being executed prior to the SUB instruction 190 even though the SHR instruction 192 does not use the result of the SUB instruction 190. Finally, the ADD instruction 180 writes register R3 which is subsequently written by the OR instruction 194. A WAW dependency exists that prevents the OR instruction 194 from being executed prior to the ADD instruction 180, even though the OR instruction 194 does not use the result of the ADD instruction 180.
Due to the limited numbers of registers in a processor, the same register will typically be written by multiple instructions during execution of a single program. In Table 1d above, because the OR instruction 194 writes register R3, a WAW dependency is created with the ADD instruction 180. If the OR instruction 194 uses a register other than R3 as a destination, for example register R10, then the WAW dependency is eliminated. The same reasoning applies to the SHR instruction 192; if the SHR instruction 192 uses a register other than R4 as a destination, then the WAR dependency with the SUB instruction 192 is eliminated. Thus WAR and WAW dependencies are "artificial" dependencies created by multiple instructions using the same register as a destination. Although impractical, WAR and WAW dependencies could be completely eliminated by constructing a program such that the same register is never written more than once.
A technique known as "register renaming", known in the prior art, can be used to eliminated WAR and WAW dependencies. Register renaming operates by changing the name of the destination register of all instructions from the name assigned by the high level language compiler (typically referred to as virtual registers) to a unique name in another namespace (typically referred to as physical registers). Table 1e depicts the instructions from Table 1d both before and after register renaming.
TABLE 1e ______________________________________ Before Renaming After Renaming ______________________________________ ADD r1, r2, r3 ADD r1, r2, rA SUB r3, r4, r5 SUB rA, r4, rB SHR r6, r7, r4 SHR r6, r7, rC OR r8, r9, r3 OR r8, r9, rD ______________________________________
The virtual register destination of each instruction is renamed to a unique (typically sequential) physical register name (for example rA, rB, etc.) and this new physical name is provided to all subsequent instructions which read the corresponding virtual register. Register renaming is commonly employed in OOO execution processors to eliminate WAR and WAW dependencies and therefore increase the number of independent instructions. Renaming is performed early in the pipeline, prior to execution, so that the instruction issue and execution logic do not see any WAR or WAW dependencies.
In the prior art conditional move (see Table 1b), the destination of the move instruction is conditionally modified, that is, if the condition evaluates to TRUE then the destination is modified, otherwise the destination is not modified. This conditional modification of the destination causes complications in the implementation of register renaming. At the time when register renaming is performed, the result of evaluating the condition is not known, in general, and therefore it is not known whether the destination register of the conditional move will be modified (and should therefore be renamed). Table 1f depicts two versions of an example code sequence, one in which the destination register of the conditional move is renamed and one in which it is not. It will be apparent to those skilled in the art that, in both situations, the name of the source register of the SUB instruction is ambiguous until the condition is resolved. The SUB instruction should read rA if the CMOVE instruction does not perform a move, or the destination of the CMOVE instruction if it does perform a move. Because of this ambiguity, register renaming is made more complex by the prior art conditional move.
TABLE 1f ______________________________________ Original Code CMOVE Renamed CMOVE Not Renamed ______________________________________ ADD r1, r2, r3 ADD r1, r2, rA ADD r1, r2, rA CMOVE r4, r5, r3 CMOVE r4, r5, rB CMOVE r4, r5, r3 SUB r3, r6, r7 SUB r?, r6, rC SUB r?, r6, rC ______________________________________
It is apparent from the preceding discussion that existing schemes for conditionally moving data in a computer system have several disadvantages. In particular, it would be advantageous to conditionally move data in a computer system without the use of conditional branch instruction. It would also be advantageous to conditionally move data without incurring the ambiguities and complexities of the prior art implementations when register renaming and out-of-order execution are involved.