The present invention relates generally to data processing and in particular to techniques for processing a conditional move instruction within a data processor.
In general, data processors are capable of executing a variety of instructions. one type of instruction is called a conditional move instruction. From a programmer""s perspective, a typical conditional move instruction instructs a processor to test whether a particular condition exists (e.g., whether a particular register stores zero), and to move information into a destination register if the particular condition exists. If the
xe2x80x83CMOVXX Sxe2x80x94RA, Sxe2x80x94RE, Dxe2x80x94RC,
where xe2x80x9cCMOVXXxe2x80x9d indicates that the instruction is a conditional move instruction that tests for a condition xe2x80x9cXXxe2x80x9d. xe2x80x9cS_RAxe2x80x9d and xe2x80x9cS_RBxe2x80x9d are source operands that respectively identify registers RA and RB. xe2x80x9cD_RCxe2x80x9d is a destination operand that identifies register RC.
In general, how a processor uses registers depends on whether the processor is capable of executing instructions out of program order. For a processor that cannot execute instructions out of program order (i.e., an in-order processor), instruction source and destination operands typically identify physical registers within the processor. The pseudo-code for executing the CMOVXX instruction in an in-order processor is as follows:
if (XX(RA)), then RC=RB.
According to the pseudo-code, the processor determines whether a condition XX exists involving physical register RA (e.g., whether physical register RA stores zero). If the condition XX exists, the processor moves the contents of physical register RB into physical register RC. Otherwise, the processor leaves the original contents of physical register RC unaltered.
In a processor that is capable of executing instructions out of program order (i.e., an out-of-order processor), instruction source and destination operands typically identify logical registers instead of the physical registers directly. The out-of-order processor maps these logical registers to physical processor registers just before instruction execution such that the result of each instruction is stored in a new physical register. This approach enables the processor to avoid problems when executing instructions out of program order (e.g., read-after-write data hazards).
The pseudo-code for executing a CMOVXX instruction in an out-of-order processor is therefore somewhat more complex. Suppose that, prior to mapping the CMOVXX instruction, the out-of-order processor maps logical register RA to physical register RA1, logical register RB to physical register RB1, and logical register RC to physical register RC1. Additionally suppose that, after mapping the CMOVXX instruction, the out-of-order processor maps logical register RC to physical register RC2 (an new physical register). The pseudo-code for executing the CMOVXX instruction in such a processor is therefore as follows:
if (XX(RA1)), then RC2=RB2 else RC2=RC1.
According to the pseudo-code, the out-of-order processor determines whether a condition XX exists involving physical register RA1 (logical register RA) If the condition XX exists, the processor moves the contents of physical register RB1 (logical register RB) into physical register RC2 (to which logical register RC presently is mapped). As such, the contents of logical register RB are stored in logical register RC. If the condition XX does not exist, the processor moves the contents of physical register RC1 (to which logical register RC previously was mapped) into physical register RC2 such that a programmer perceives the contents of logical register RC as remaining unaltered.
When a processor executes an instruction within an instruction stream, an execution circuit (or unit) of the processor receives instruction data through input ports, and executes the instruction according to the instruction data. For example, an execution unit of an in-order processor may execute the conditional move instruction:
CMOVXX Sxe2x80x94RA, Sxe2x80x94RB, Dxe2x80x94RC
according to the pseudo-code:
if (XX(RA)), then RC=RB
where RA, RB and RC refer to physical registers within the in-order processor. To receive instruction data used by the CMOVXX instruction, the execution unit requires only two input ports: a first port to receive the contents of physical register RA, and a second port to receive the contents of physical register RB.
However, an execution unit of an out-of-order processor executes the CMOVXX instruction according to the following pseudo-code:
if (XX(RA1)), then RC2=RB1 else RC2=RC1
where RA1, RB1, RC1 and RC2 refer to physical registers within the in-order processor. To implement this instruction, the out of order execution unit requires three input ports: a first port to receive the contents of physical register RA1, a second port to receive the contents of physical register RB1, and a third port to receive the contents of physical register RC1.
There are disadvantages to a processor that uses three input ports to execute instructions. In particular, such a processor would require substantial semiconductor resources (e.g., a disproportionately large area for input port routing). Additionally, processors typically use no more than two input ports to execute non-conditional move instructions. Accordingly, processor designers generally prefer to limit the number of input ports for each instruction to no more than two. Unfortunately, as explained above, a conventional implementation the CMOVXX instruction within an out-of-order processor uses three input ports.
In contrast, an embodiment of the present invention is directed to a technique for handling a conditional move instruction in an out-of-order data processor. The technique involves detecting a conditional move instruction within an instruction stream, and generating multiple instructions according to the detected conditional move instruction. The technique further involves replacing the conditional move instruction within the instruction stream with the generated multiple instructions. Preferably, each of the generated multiple instructions executes using no more than two input ports. As such, it is unnecessary for the processor to use three input ports to execute the instructions.
The generation of multiple instructions preferably involves providing a first generated instruction that determines whether a condition exists, and providing a second generated instruction that performs a move operation based on whether the condition exists. In particular, the second generated instruction performs a first move operation when the condition is determined to exist, and a second move operation when the condition is determined not to exist. When the condition exists, the first move operation loads a new physical register with contents from a specified source register so that, from a programmer""s perspective, the processor alters a logical register mapped to the new physical register. When the condition does not exist, the second move operation loads the new physical register with contents of a previously used physical register (to which the logical register was previously mapped) so that, from the programmer""s perspective, the processor leaves the logical register unaltered.
Instruction generation may involve providing a first generated instruction that produces a condition result, and providing a second generated instruction that (i) inputs the condition result from a first portion of a register that is separate from a second portion that stores standard contents of the register, and (ii) performs an operation according to the first portion. To this end, the mechanisms for storing the condition result and the standard contents are treated as a single entity (e.g., a register with an extra bit field to store the condition result) rather than as separate registers. As such, the same circuitry for addressing and accessing the standard portion of the registers can be used to address and access the condition field. This feature allows the processor to transfer the condition result through one of two existing input ports alleviating the need for a third input port to carry the condition result. In particular, the processor includes a register file containing instruction registers, each of which has a standard field and a condition field.
Prior to detecting the conditional move instruction, instructions may be loaded from memory in groups (e.g., fetch blocks). In particular, the technique may further involve retrieving a first group of instructions from a memory during a first fetch period, the first group of instructions including the conditional move instruction. Such a retrieval enables instructions to be loaded using less retrieve operations than loading instructions individually.
Other subsequent groups of instructions may be loaded as well. For example, the technique may further include retrieving a second group of instructions from the memory during a second fetch period, the second group following the first group within the instruction stream. The technique may involve retrieving the second group of instructions from the memory again during a third fetch period while the multiple instructions are generated simultaneously. This feature provides an optimization in the sense that, retrieval of the second group of instructions during the third fetch period will make the second group of instructions available at a convenient point in the pipeline to receive one of the generated multiple instructions.
Alternatively, the technique may involve overwriting the conditional move instruction in the retrieved first group of instructions with one of the generated multiple instructions, and overwriting an instruction following the conditional move instruction in the retrieved first group of instructions with another of the generated multiple instructions. In this situation, the instruction following the conditional move instruction is preferably a blank instruction that performs no operation when executed. Accordingly, the processor simply modifies the fetch block containing the conditional move instruction without affecting a subsequent fetch block.