In an Out-Of-Order (“OOO”) microprocessor, instructions are allowed to issue out of their program order. However, in most cases, they are required to retire from the machine in order. Further, memory operations in the machine, regardless of the issue order, need to acquire and update memory status in program order. These diverging ordering behaviors give rise to problems at several locations in a micro-architecture. For example, in most OOO micro-architectures, allocating into queues, e.g., the load-store queue (LSQ), cannot be performed based on element arrival, which would be more computationally efficient, because the elements need to be removed in order.
As a result, complexity is often added to the machine, because element tagging and allocation needs to take place in all resources at the time of element allocation, e.g., the instruction allocation buffer also known as the “re-order buffer” (“ROB”) needs to perform tagging and allocation of resources at the time of instruction allocation.
For example, FIG. 1 illustrates a pipeline for a conventional OOO microprocessor. Instructions are fetched at the fetch stage 102 and placed in the instruction fetch queue (IFQ) (not shown) within fetch stage 102. The instructions are generally the original assembly instructions found in the executable program. These instructions reference the architectural registers which are stored in register file 110. If the first fetched instruction was to be interrupted or raise an exception, the architectural register file 110 stores the results of all instructions until that point. Stated differently, the architectural register file stores the state that needs to be saved and restored in order to return back to the program during debugging or otherwise.
In an OOO microprocessor, the instructions execute out of order while still preserving data dependence constraints. Because instructions may finish in an arbitrary order, the architectural register file 110 cannot be modified by the instructions as they finish because it would make it difficult to restore their values accurately in the event of an exception or an interrupt. Hence, every instruction that enters the pipeline is provided a temporary register where it can save its result. The temporary registers are eventually written into the architectural register file in program order. Thus, even though instructions are being executed out of order, the contents of the architectural register files change as though they were being executed in program order.
The ROB 108 facilitates this process. After the instructions are dispatched from the fetch unit 102, they are decoded by decode module 104 and are placed in the ROB 108 and issue queue 106 (IQ). The ROB 108 and IQ 106 may be part of a scheduler module 172. As instructions are issued out of IQ 106 out of order, they are executed by execute module 112.
The write back module 114, in a conventional OOO micro-architecture will write the resulting values from those instructions back to the temporary registers in ROB 108 first. The ROB 108 keeps track of the program order in which instructions entered the pipeline and for each of these instructions, the ROB maintains temporary register storage. When the oldest instructions in the ROB produce a valid result, those instructions can be safely “committed.” That is, the results of those instructions can be made permanent since there is no earlier instruction that can raise a mispredict or exception that may undo the effect of those instructions. When instructions are ready to be committed, the ROB 108 will move the corresponding values in the temporary registers for those instructions to the architectural register file 110. Therefore, through the ROB's in-order commit process, the results in the register file 110 are made permanent and architecturally visible.
By using the ROB 108 module as an intermediary between the write back module 114 and the register file 110, a delay at the commit stage is introduced by conventional OOO processors. Further, in order for the ROB 108 module to be able to move the values of the temporary registers to the register file 110 quickly during the commit cycle, the ROB needs to be placed in relatively close proximity to the register file 110, thereby, introducing an additional constraint on the design of the OOO architecture.
The instructions issued out of order from the IQ 106 may also comprise loads and stores. A load instruction uses registers in the register file 110 to compute an effective address and, subsequently, brings the data from that address in memory 118 into a register in register file 110. The store similarly uses registers in the register file 110 to compute an effective address, then transfers data from a register into that address in memory 118. Hence, loads and stores must first wait for register dependencies to be resolved in order to compute their respective effective address. Accordingly, each store instruction is queued in a load/store queue (LSQ) 116 while it is waiting for a register value to be produced-when it receives the broadcast regarding its availability, the effective address computation part of the store is issued.
Additionally, store instructions are queued in a LSQ because when stores are issued out of order from the IQ 106, there are memory dependencies between loads and the store instructions that need to be resolved before they can access memory 118. For example, a load can access the memory only after it is confirmed there are no prior stores that refer to the same address. It is, once again, the ROB 108 that is used to keep track of the various dependencies between the stores and the loads.
The scheduler 172 can also comprise an index array 140 that the ROB 108 communicates with in order to track the various dependencies. The index array 140 is used to store tags that the ROB 108 assigns to all load and store instructions that are dispatched from IQ 106. These tags are used to designate slots in the LSQ 116 for the store instructions, so that the instructions can be allocated in the LSQ 116 in program order. This, in turn, allows memory 118 to be accessed by the store instructions in program order. As a result, in conventional OOO processors, additional storage can be required for an index array 140 that stores tags for the respective locations of store instructions in the LSQ. Further, additional communication overhead is required to tag all store instructions, to convey the tags along with the store instructions to the LSQ, and to communicate to the LSQ to add the store instructions to the locations designated by the respective tags.