Modern, high-performance microprocessors use sophisticated instruction scheduling mechanisms and pipelines designed to reorder the startup and completion of instructions in a sequential instruction stream so as to achieve a high-level of processor performance. One common form of such mechanisms is a superscalar microprocessor that is capable of fetching, decoding, issuing, executing, completing and retiring more than one instruction within a single cycle of the clock signal used to synchronize activities at the lowest level in the microprocessor. As used hereinbelow, the term instruction refers to the smallest unit of work that is scheduled independently within a microprocessor.
In a typical superscalar microprocessor, instructions are fetched from an instruction cache (I-cache) in program order along the predicted path of execution. The instructions are then decoded to resolve inter-instruction dependencies and are then dispatched into a buffer commonly known as the issue queue (IQ). Then, subject to both the availability of execution units (also called function units or FUs), and the input operands of the instruction, each instruction is eventually executed.
Instructions that are ready for execution, issued from the IQ to the chosen FU, may therefore start as well as finish execution out of program order. To comply with the sequential semantics of the executing program, the processor state as defined by the contents of the committed or architectural registers, as well as the state of the memory, must be updated in program order despite the fact that instructions can complete out of program order. This requirement is met by collecting the results produced out-of-program-order into another buffer called the reorder buffer (ROB). Information stored in the ROB is used to update the processor and memory state into the original program order.
As instructions are decoded in program order, an entry is simultaneously established in program order in the ROB, which behaves as a first-in, first-out (FIFO) queue. At the time of dispatch, the entry for the dispatched instruction is made at the tail of the ROB. The ROB entry for an instruction can itself serve as the repository of the instruction's results or it may point to the repository of the results within a separate physical register file.
The process of retiring or committing an instruction involves updating the processor's state and/or the memory state in program order, typically using the information stored in the ROB. Instructions are retired from the head of the ROB. If the ROB entry at the head of the ROB is awaiting the completion of the corresponding instruction, instruction retirement is blocked (i.e., halted) momentarily until the results are correctly produced.
To process load and store instructions that move data between memory locations and registers, many modern microprocessors also employ a load-store queue (LSQ), which also behaves as a FIFO queue. Entries are established for load and store instructions in program order as the instructions are dispatched, at the tail of the LSQ. Memory operations are started from the LSQ to conform to program ordering.
In modern microprocessor systems, the overall design strategy has heretofore been a “one-size-fits-all” approach, where the datapath resources like the IQ, ROB, registers and LSQ are set at predetermined, fixed sizes irrespective of the changes in the instantaneous needs of am executing program for these resources. As a result, these resources frequently remain under-utilized. Unused portions of the resources remain powered up, wasting energy and power.