Many techniques have been developed in the art of microprocessor design to improve system performance by reducing program completion time; that is, by reducing the number of clock cycles required to perform the fetch, decode, and execute steps of a program instruction. One well-known technique uses a technique called pipelining. Pipelining is a technique analogous to an assembly line in which sequentially connected units operate in parallel to perform various steps of an instruction in an overlapping configuration thereby reducing the total cycle time.
As described in MIKE JOHNSON, SUPERSCALAR MICROPROCESSOR DESIGN (1991), the efficiency of such a "scalar" pipeline microprocessor, which executes one instruction at a time, can be improved further by expanding the pipeline stages so that respectively decode, fetch and/or execute steps are each simultaneously performed on multiple instructions. Johnson denotes this type of processor as a "superscalar" microprocessor. A second feature of superscalar processors is the ability to perform the steps of instruction completion out of the strict sequential order of the program. However, in order to correctly execute an instruction in parallel or using out-of-order instruction completion, the dependencies of each instruction on each other instruction must be taken into consideration. There are three types of instruction dependencies referred to as resource conflicts, procedural dependencies and data dependencies. Resource conflicts occur when two instructions executed in parallel contend to access the same resource, e.g., the system bus. A procedural dependency occurs where execution of a first instruction depends on the outcome of execution of a previous instruction, such as a branch instruction. Generally, it cannot be determined ahead of time whether or not the branch will be taken (although branch prediction algorithms can often predict the correct branch with a high accuracy). Data dependencies occur when the completion of a first instruction changes the value stored in a register or memory that is later accessed by a later completed second instruction.
Data depedency is a heavily studied topic in supescalar processor design. Data dependencies can be classified into three types referred to as "true data dependencies," "antidependencies" and "output data dependencies". An instruction which uses a value computed by a previous instruction has a "true" (or data) dependency on the previous instruction. An example of an output dependency is, in out-of-order completion, where two sequential instructions both assign the same register or memory location to different values and a later instruction uses the value stored in the register or memory location as an operand. The earlier instruction cannot complete after the later instruction or else the third instruction will have the wrong value. An example of an antidependency also occurs in out-of-order execution wherein a later instruction, executed out of order and before a previous instruction, may produce a value that destroys a value used by the previous instruction. As illustrations of true dependency, output dependency and antidependency, consider the following sequence of instructions:
(1) R3:=R3 op R5 PA1 (2) R4:=R3+1 PA1 (3) R3:=R5+1 PA1 (4) R7:=R3 op R4 PA1 (1) R3.sub.b :=R3.sub.a op R5.sub.a PA1 (2) R4.sub.b :=R3.sub.b +1 PA1 (3) R3.sub.a :=R5.sub.a +1 PA1 (4) R7.sub.b :=R3.sub.c op R4.sub.b
Instruction (2) has a true dependency on instruction (1) since the value stored in R3, to be used as an operand in instruction (2), is determined by instruction (1). Instruction (3) has an antidependency on instruction (2) since instruction (3) modifies the contents of register R3. If instruction R3 is executed out of order and before instruction (2) then instruction (2) will use the wrong value stored in register R3 (in particular, the value as modified by instruction (3)). Instructions (1) and (3) have an output dependency. Instruction (1) cannot complete out of order and after instruction (3) because the resulting value, as determined by instruction (3), must be the last value stored in register R3, not the resulting value as determined by instruction (1), so that instruction (4) will execute on the correct operand value stored in register R3.
In order to resolve these dependencies/storage conflicts, a technique known in the art as register renaming can be implemented. According to register renaming, additional "substitution" registers are provided for purposes of reestablishing the correspondence between registers and values. Whenever, an executed instruction is intended to write a result value to a particularly named register, the processor typically dynamically allocates one of the substitute registers for storing the result instead of the particularly named register. A subsequently executed instruction that uses the particularly named register as an operand is provided instead the result value stored in the substitute register. For instance, consider the above sequence of instructions as implemented with register renaming:
As noted above, each assignment to a particular register, R3, R4, R5 or R7 creates a new instance of the register, e.g., R3.sub.a, R3.sub.b or R3.sub.c. Note that instruction (2), which uses the value stored in register R3 as an operand, is provided the value stored in register R3.sub.b, namely, the value stored in the substitute register provided for the instruction (1), and not the value of the substitute register R3.sub.a provided for storing the result of the instruction (3). Likewise, instruction (4) is provided the value of the substitute register R3.sub.c (the assignment of which is not shown), and not the value of the substitute register R3.sub.a (provided for storing the result of instruction (3)) or the substitute register R3.sub.b (provided for storing the result of instruction (1)).
To implement register renaming, a superscalar microprocessor system may incorporate a reorder buffer, which contains a number of storage locations for entries that are dynamically allocated to instruction results. When an instruction is decoded, its result value is assigned a reorder buffer storage location, and its destination register number is associated with this location. Thus, the destination register is "renamed" to the reorder buffer location. A tag, or temporary hardware identifier, is created to identify the result, and the tag is also stored in the assigned reorder buffer storage location. When a subsequent instruction refers to the renamed destination register, the instruction obtains the value stored in the reorder buffer, or the tag for this value, if the value has not yet been computed.
The reorder buffer in a superscalar microprocessor architecture is typically included within a scheduling unit, which will also contain reservation stations and a register file. A reservation station is a buffer assigned to a particular functional unit (a device that executes instructions such as an arithmetic logic unit or ALU, floating point unit or FPU, etc.), which temporarily stores decoded instructions pending execution in its respective functional unit. The reservation stations contain the logic circuitry required to eliminate resource conflicts, as may occur when more than one instruction requires the same resource at the same time. The register file receives completed result updates in proper program sequence from the reorder buffer, which contains the renamed destination registers as described above.
Certain X86 architecture microprocessors have 32-bit registers that can be accessed in whole or in part. For example, the 386 has a register named EAX with 32 bits. The least significant 16 bits may be separately accessed as a 16 bit register named AX. An access (read or write) to this register only affects the 16 least significant bits of the EAX register. Likwise, the 8 most significant bits of the AX register may be separately accessed as an 8 bit register called AH and the 8 least significant bits of the AX register may be separately accessed as an 8 bit register called AL.
The prior art discloses a number of register renaming and in-order result write-back schemes. However, all of such prior art techniques assume that all register accesses are to physically distinct registers, with no capability for partial word (i.e., 16-bit or 8-bit) accessing.
U.S. Pat. No. 4,992,938 discloses a register renaming system which eliminates output dependencies and allows computations aliased to the same register to proceed in parallel. This technique uses a Mapping Table, a Pending Target Queue, a Store Queue, and a Free List, to map register numbers to a set of registers within the system.
U.S. Pat. No. 5,134,561 discloses a renaming system which identifies particular addressable registers. An Array Control List and Decode Register Assignment List provide register renamings. A back-up register Assignment List preserves old information while out-of-sequence and conditional branch instructions are executed.
U.S. Pat. No. 5,345,569 discloses a reorder buffer which employs pointers to resolve data dependencies.
As stated above, each of the above described prior art patents uses a register renaming technique which only accesses complete register words. Therefore, there is no partial word accessing capability in the prior art for result writes and operand reads. As such, when partial register accesses are performed, no out-of-order or parallel instruction completion techniques can be used to speed up such instructions. Furthermore, other instructions which fully access such registers (i.e., the EAX register) that are sequentially near the partial word access instructions may be delayed to ensure that none of the above data dependency constraints are violated. Hence, the performance efficiency of the prior art is limited, since there is no capability for parallel execution of partial word operands.
Accordingly, it is an object of the present invention to improve the instruction completion efficiency of an out-of-order issue/execute superscalar microprocessor by providing a reorder buffer architecture and method which uses partial word accessing to enhance parallel instruction execution.