1. Field of the Invention
This invention relates generally to processors and computers, and more particularly, to a method and apparatus that partially decodes instructions into micro-ops before renaming destination registers.
2. Description of the Related Art
Modem processors and computers have increased their operating speeds and efficiency through a variety of methods and structures. Many processors and similar hardware structures increase instruction throughput by executing instructions either in parallel or out-of-order of the original instruction sequence. The instruction throughput of an out-of-order processor may be improved by using auxiliary hardware structures and special methods that make the execution order more flexible. The hardware structures and special methods include fetchers that employ branch prediction, parallel decoders, large reorder buffers, dependency determination, and renaming of destination registers. These hardware structures and methods for treating the incoming instruction sequence may affect the processor's throughput by synergistic interactions.
FIG. 1 illustrates a portion 10 of an out-of-order processor. A fetcher 12 retrieves a sequence of instructions from memory or caches 14. The fetcher 12 sends the retrieved instructions to one or more decoders 16. The decoders 16 translate the complex instructions, e.g., macro instructions, into simpler hardware executable micro-operations (micro-ops). The decoders 16 send a sequence of micro-ops to a renamer 18. The renamer 18 reassigns additional physical registers to replace destination registers of the micro-ops, i.e. the registers for storing results. The renamer 18 may also record data on the dependencies between the micro-ops and on the reassignment of additional physical registers in a dependency table 20. The renamer 18 assigns the renamed micro-ops entries in a reorder buffer 22 and sends the micro-ops to a scheduler 24. The scheduler 24 assigns instructions for execution in an order that may not follow the original order of the instruction sequence. The scheduler 24 does not assign dependent instructions for execution before the instructions on which they depend. The scheduler 24 consults and updates the dependency and register assignment information in the dependency table 20. A retirement unit 28 removes executed instructions from the reorder buffer 28 in the original instruction order and sends the results to registers and/or caches 30.
FIG. 2A illustrates the effect of renaming on the out-of-order execution of write-after-write instructions, i.e. two instructions having the same destination register. At block 40, three instructions 42, 44, 46 are shown in the original order of an instruction sequence. The instructions 42, 44, 46 have been decoded, i.e. they are micro-ops.
The source and destination addresses can include both memory locations and registers. While macro instructions may have a variable number of addresses, decoded instructions, i.e. micro-ops, have a limited and fixed number of destination and source addresses for a given micro-architecture. In most computers, micro-ops do not have more than two source addresses, e.g., A and B for the first instruction 42, and one destination address, e.g., the register R.sub.1 for the first and second instructions 42, 44. Some computers may employ micro-ops with more source or destination addresses, but the micro-ops of each architecture still have a limited number of addresses.
Due to the small register sets in many computers, the same register may appear in two or more instructions. For example, the register R.sub.1 is the destination address of the first and second instructions 42, 44 of block 40 and is one source address of the third instruction 46. The appearance of the same register in separate instructions can create real and artificial dependencies that prohibit the out-of-order execution of the instructions involved and of instructions dependent thereon.
In the processor 10, it is ordinarily advantageous to be able to execute instructions in any order that keeps the execution unit or units 26 continually busy. For example, executing the second instruction 44 before the first instruction 42 may enable the processor 10 to avoid a period in which one of the execution units 26 is inactive. Instruction dependencies can make the results artificially dependent on the execution order and can interfere with the use of out-of-order execution as a means of improving the efficiency of the processor 10.
Block 50 illustrates the result of executing the second instruction 44 out-of-order, i.e. before the first instruction 42. Since the first and second instructions 42, 44 do not depend on each other, they may be executed in any order. Nevertheless, a problem occurs when the execution order of these write-after-write instructions 42, 44 is inverted, because the third instruction 46 depends on the second instruction 44 in block 40, and the third instruction 46 depends on the first instruction 42 in the inverted order of block 50. Executing write-after-write instructions out-of-order may change the results from subsequent dependent instructions.
Block 52 illustrates how renaming eliminates dependencies when write-after-write instructions are executed out-of-order. The renamer 18 assigns a new additional physical register R'.sub.1 for the destination register of the second instruction 44 in block 40, i.e. the register R.sub.1 is renamed to R'.sub.1, in the instruction 45 of block 52. The renamer 18 also replaces R.sub.1 with R'.sub.1, in instructions dependent on the second instruction 44, i.e. the source register R.sub.1 of the third instruction 46 of block 40 is renamed to R'.sub.1, giving the third instruction 47 of block 52. Renaming uses additional physical registers of the processor 10, such as R'.sub.1, to "rename", i.e. replace, registers appearing as source or destination addresses in instructions. After renaming, the destination register of the first instruction 42 of block 52 is not an address of the third instruction 47 of block 52. Therefore, the execution of the first instruction 42 and the renamed second instruction 45 may be performed in any order without changing the results coming from dependent instructions, such as the third instruction 47 of block 52.
As illustrated in FIG. 2B, renaming may also be employed to enable the out-of-order execution of write-after-read instructions, i.e. two instructions wherein the same register is a source address for the earlier instruction and a destination address for the later instruction. At block 60, an instruction 62 having a destination register R.sub.1 is independent of an earlier instruction 64 having the same register R.sub.1 as a source address. Block 66 illustrates the result of executing the first and second instructions 64, 62 of block 60 out-of-order. Now, the destination register R.sub.1 of the earlier executed instruction 62 becomes a source address for the later executed instruction 64. The out-of-order execution of the write-after-read instructions 64, 62 generates a new dependency for the instruction 64. Generally, the new dependency for the instruction 64 means that different results are obtained when the two instructions 62, 64 are executed in the original instruction order of block 60 and in the inverted order of block 66.
At block 68, the original destination register R.sub.1 of the second instruction 62 of the write-after-read sequence of block 60 has been renamed with an additional physical register R'.sub.1. The renamer 18 also renames the source address R.sub.1 to R'.sub.1 in all subsequent instructions that depend on the original second instruction 62 of block 60. The renamer performs a look-up in the dependency table 20 for correspondences between the original destination register and the additional physical registers so that source registers of dependent instructions are renamed consistently with the previously renamed destination register. After renaming, the scheduler 24 may set the first and second instructions 64, 63, of block 68, for execution in the original order or out of the original order without changing the results.
In the prior art, the decoders 16 produce a sequence of micro-ops from the sequence of instructions received from the fetcher 14. The renamer 18 "blindly" renames the temporary destination registers of each micro-op from the decoders 16 by replacing each temporary destination register with one of the additional physical registers of the processor 10. As long as additional physical registers are available, the renamer 18 continues to assign different physical registers for the temporary destination registers.
The blind decoding and renaming of instructions has, however, several shortcomings. First, the renamer 18 is a multi-ported structure and the renaming process involves look-ups in the dependency table 20 that may slow the instruction throughput of the processor 10. Second, the renamer 18 utilizes long and wide interconnecting wires. Devices using long and wide wires may be constrained to operate at slower speeds due to substantial capacitances associated with such structures. Third, blind renaming uses up more additional physical registers and leaves less additional physical registers available for renaming later instructions. Fourth, though the speed of a processor using blind decoding and renaming may be increased by adding parallel structures or more ports to renamers 18, renamers with parallel structures or more ports occupy more precious area of chip surface. Finally, renaming is not always helpful in facilitating out-of-order execution, because some dependencies are real. Blind renamer structures of the prior art may be inefficient, use more space on the chip surface, operate more slowly due to substantial internal capacitances, need more additional physical registers, and may not facilitate flexible execution of all instructions.
FIG. 3 illustrates a situation 70 in which renaming a destination register does not facilitate out-of-order execution. At block 71, a macro instruction 72 is decoded into micro-ops 73, 74. The first micro-op 73 has a destination register T which is also a source address for the second micro-op 74. Furthermore, later dependent instructions of the instruction sequence (not shown) do not have T as a source register, because the register T only appeared from decoding the original macro instruction 72 into the two micro-ops 73, 74. The two micro-ops 73, 74 both create and eat up the temporary register T. Furthermore, the second micro-op 74 is dependent on the first micro-op 73. Therefore, the second micro-op 74 cannot be assigned for execution before the first micro-op 73. If the renamer 18 blindly replaces all destination registers by additional physical registers, the original register T is replaced by the additional physical register T' at block 76. But, the dependent micro-op 78 still cannot be executed before the first micro-op 77. Thus, no advantage is obtained by renaming the original destination register T of the original first micro-op 73. Renaming has made an additional physical register, e.g., T', unavailable for renaming other instructions, has wasted processor time in the doing look-ups, and has not facilitated out-of-order operation.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.