1. Field of the Invention
The present invention relates to an improvement of out-of-order CPU architecture regarding code flexibility. In particular, it relates to a method for operating an out-of-order processor having an architecture of a larger bitlength with a program comprising instructions compiled to produce instruction results of a smaller bitlength.
2. Description of the Prior Art
The present invention has a quite general scope which is not limited to a vendor-specific processor architecture because its key concepts are independent therefrom.
Despite of this fact it will be discussed with a specific prior art processor architecture.
FIG. 1 shows a schematically depicted prior art out-of-order processor 100, in this example a IBM S/390 processor having an essential component a so-called Instruction Window Buffer, further referred to herein as IWB. This is also depicted in FIG. 2 with reference numeral 110.
After coming from an instruction cache 160 and passed through a decode and branch prediction unit 170 the instructions are dispatched still in-order. In this out-of-order processor the instructions are allowed to be executed and the results written back into the IWB out-of-order.
In other words, after the instructions have been fetched by a fetch unit 170, stored in the instruction queue 140 and have been renamed in a renaming unit 115, see FIG. 2, they are stored in-order into a part of the IWB called reservation station (RS) 120. From the reservation station the instructions may be issued out-of-order to a plurality of instruction execution units 180 abbreviated herein as IEU, and the speculative results are stored in a temporary register buffer, called reorder buffer 125, abbreviated herein as (ROB). These speculative results are committed (or retired) in the actual program order thereby transforming the speculative result into the architectural state within a register file 130, a so-called Architected Register Array (ARA). In this way it is assured that the out-of-order processor with respect to its architectural state behaves like an in-order processor. The communication between rename unit reservation station 120, reorder buffer 125 and register file 130 is done with a multiplexer element 150.
In the before-mentioned prior art, exemplarily cited processor the central components of an out-of-order processor are implemented as a unified buffer, the above said Instruction Window Buffer (IWB).
Next, the Instruction Window Buffer components are described in some more detail and with reference to FIGS. 2 and 3 while introducing the problems of handling 32 bit instruction results in said exemplarily chosen 64 bit S/390 architecture.
From the instruction queue 140 up to 4 instructions are dispatched each cycle in program order to the IWB. The IWB pipeline is depicted in FIG. 3 and starts with renaming, 310, said up to 4 dispatched instructions. The renaming process, translates the source logical register address into a physical address specifying where the speculative result resides or will be stored after execution. Furthermore, it allocates new ROB entries for the storage of the speculative results after execution of the dispatched instructions.
The detection of a dependency of a source register with the target register of an instruction that resides in the IWB is done by the renaming logic by comparing the source operand register addresses with the target operand register addresses stored for each entry.
Next, match(0 . . . 63) signals generated for each entry are ANDed with a so called “current_bit(0 . . . 63)”. A current_bit(i) is only ON when an instruction i is the youngest instruction in the IWB for the specific logical target register address. It should be noted that ANDing the match(0 . . . 63) with the current_bit(0 . . . 63) string—thereby generating the RSEL(0 . . . 63) string in FIG. 2—is needed, since several matches may be found for the same logical target address. However, only the match with the youngest instructions specifies the correct dependency. It should be noted further that instead of a current bit a priority filter logic could also be used to filter out the youngest match and thereby generating the RSEL(0 . . . 63) for an operand. The generation of the RSEL(0 . . . 63) string has been described here for a single operand, but it will be clear that in the case when more operands or more instructions are renamed, then for each operand such a RSEL(0 . . . 63) string is generated.
In the next “read ROB” cycle 320, the RSEL(0 . . . 63) selects the ReOrder Buffer (ROB). As a result the tag, data validity bit and target data (if available) will appear at the output ports of the ROB 125 for each source operand at the end of the second cycle. Dependent on the protocol that the IEUs supports, the tag, validity and data may not be read in the same cycle. In other words, after the read out of the tag, the read out of the data or the validity bits may be realized in separate cycles to maintain the consistency of the data between reorder buffer ROB 125 and reservation station 120. In case that there is no dependency (RSEL(0 . . . 63)=“00 . . . 00”) the “read_ARA” signal is switched ON by the ROB causing that the operand data will be read from the ARA 230 addressed by the logical address. This ends the “read ROB”cycle.
Next, in the “write RS” cycle 330 the tag, validity and data is written into the reservation 220 -entry allocated to the renamed instructions. Again, the writing of data in the reservation station may be delayed for the validity and data bits dependent on the protocol for tag and data of the IEU″s.
In the next cycle, the “select” cycle 340, the instructions for which the data was written into the reservation station in the previous cycle will be included into the set of instructions that are considered by the select logic for issue. In the IWB the select logic selects the oldest instruction that waits for issue for each IEU. This logic is implemented by a priority filter like it is described in the above referenced patent application. As a result of the select logic a string issue(0 . . . 63) is generated for the IEUs. A bit issue(i)=“1” specifies that this entry in the RS 220 has to be issued to an IEU.
The generation of one or more issue(0 . . . 63) strings by the select logic ends the select cycle. It should be noted that the select logic may select the instruction for issue out of the normal program order for execution dependent on the availability of the source data for each instruction.
In the issue cycle 350, the issue(0 . . . 63) strings specify the RS entry that has to be read out, and at the end of the cycle the data, control, tag, etc. bits will appear at the RS ports to the IEUs.
Finally, then the execution of the instruction is done in the cycles “exe 1”,360 and “exe 2” 370. The tags, specifying the entry where the data has to be stored in the ROB 125 and the RS 120, are compared with the stored tags for the sources. In case of a match the validity bit is set and the result data is stored in the sources of the dependent instruction in the RS.
Finally the commit process will free-up the IWB entries in the original program order by copying the data from the ROB to the ARA 130.
As soon as the data has been written into the ARA it has become the architectural state of the processor and the IWB entries can be used again to store the new instructions dispatched by the fetch unit.