1. Field of the Invention
This invention is related to the field of superscalar microprocessors and, more particularly, to handling data dependencies between instructions in a microprocessor.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term xe2x80x9cclock cyclexe2x80x9d refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term xe2x80x9cinstruction processing pipelinexe2x80x9d is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
In order to increase performance, superscalar microprocessors often employ out of order execution. The instructions within a program are ordered, such that a first instruction is intended to be executed before a second instruction, etc. One hazard of out of order execution is ensuring the intended functionality of the program is not altered. When the instructions are executed in the order specified, the intended functionality of the program is realized. However, instructions may be executed in any order as long as the original functionality is maintained. For example, a second instruction which does not depend upon a first instruction may be executed prior to the first instruction, even if the first instruction is prior to the second instruction in program order. A second instruction depends upon a first instruction if a result produced by the first instruction is employed as an operand of the second instruction. The second instruction is said to have a dependency upon the first instruction.
As used herein, a source operand of an instruction is a value to be operated upon by the instruction in order to produce a result. Conversely, a destination operand is the result of the instruction. Source and destination operands of an instruction are generally referred to as operand information. An instruction specifies the location storing the source operands and the location in which to store the destination operand. An operand may be stored in a register (a xe2x80x9cregister operandxe2x80x9d) or a memory location (a xe2x80x9cmemory operandxe2x80x9d). As used herein, a register is a storage location included within the microprocessor which is used to store instruction results. Registers may be specified as source or destination storage locations for an instruction.
An additional difficulty which exists in the x86 instruction set architecture is the ability to update portions of registers. Typically, destination operands may be 8, 16, or 32 bits and the registers are 32 bits. Consequently, dependencies may be created when only a portion of a destination register is being updated. For example, when an instruction that updates only a portion of a destination occurs prior to a subsequent instruction which requires the entirety of the same destination as a source operand, the subsequent instruction must wait until the prior instruction has executed and produced results. In addition to the produced result, the subsequent instruction requires for its source operand the portion of the destination register that was not updated by the prior instruction. One possible solution is to wait until the instruction is retired and the result has been written to the register file. Then the subsequent instruction reads the source operand from the register file. However, if the subsequent instruction must wait until the prior instruction is retired so that it must read the entire updated register, processing delays are introduced. Alternatively, the microprocessor may include circuitry which treats 8, 16 and 32 bit operands independently. This may permit the microprocessor to update only that portion of the register which has been updated by the instruction. However, treating different portions of a register independently adds significant complexity to the microprocessor circuitry.
The problems outlined above are in large part solved by a microprocessor and method as described herein. When an instruction is decoded that updates only a portion of the destination register, a read is done of the destination register prior to execution of the instruction. When the instruction is executed, the result of the executed instruction is merged with the prior read data to form the full result register value. Advantageously, any instruction which is dependent on this result as a source operand will have its source operand provided by the prior instruction upon execution of the prior instruction. Further, since other portions of the microprocessor (e.g., load/store unit and reorder buffer) may treat all operands as 32 bits, the other portions may be simplified as they no longer need consider the size of operands.
Broadly speaking, a microprocessor is contemplated comprising a decode unit configured to decode an instruction and a functional unit configured to execute instructions. The decode unit is configured to detect when an instruction only updates a portion of a destination. In addition, the decode unit is configured to convey operand request information, a decoded instruction and operand size information. The functional unit is coupled to receive a decoded instruction, operand size information, and operand data. In addition to executing an instruction, the functional unit is configured to merge the contents of a destination of an instruction with the results of the execution of the instruction in response to detecting the instruction only updates a portion of the destination. Finally, the functional unit is configured to convey the merged data as the result of the executed instruction.
Further, a method is contemplated. An instruction is decoded. The decoding includes determining if the instruction updates only a portion of a destination. The instruction is subsequently executed to produce an execution result. If it is determined that the instruction only updates a portion of the destination, the execution result is merged with a first data. The merged result is then conveyed as the result of the instruction execution.
Further contemplated is a functional unit comprising an arithmetic logic unit, select circuitry, and a plurality of multiplexors. The arithmetic logic unit is coupled to receive operand data and size information and is configured to execute a decoded instruction to produce a first data. The select circuitry is coupled to receive destination operand size information and is configured to convey control signals. Each of the plurality of multiplexors is coupled to receive a portion of the first data, and a portion of a second data. Each multiplexor is also configured to receive a control signal from the select circuitry and are configured to convey the portion of the first data in response to detecting a first condition of the control signal, and are configured to convey the portion of the second data in response to detecting a second condition of said control signal. The conveyed portions are then merged to form a third data which is conveyed as the result of the executed instruction.
Still further contemplated is a computer system comprising a microprocessor, functional unit, and an I/O unit. The microprocessor includes a decode unit configured to decode an instruction and to detect when an instruction only updates a portion of a destination. In addition, the decode unit is configured to convey operand request information, a decoded instruction and operand size information. The functional unit is coupled to receive a decoded instruction, operand size information, and operand data. In addition to executing an instruction, the functional unit is configured to merge the contents of a destination of an instruction with the results of the execution of the instruction in response to detecting the instruction only updates a portion of the destination. Finally, the functional unit is configured to convey the merged data as the result of the executed instruction.