The present invention relates to microprocessors, and, more particularly, to processing instructions in a superscalar microprocessor.
In a superscalar microprocessor, there are multiple execution resources, which can operate simultaneously, and thus produce multiple results per clock cycle. There may be different classes of resources which can perform a task and satisfy an instruction. An instruction stream is decoded, the necessary dependency information is recorded, and the instructions are stored in a structure known as an instruction shelf. An instruction scheduler determines which instructions currently in the instruction shelf, are xe2x80x9creadyxe2x80x9d for execution and what class of resource is required by each. An instruction is determined to be ready when all the other instructions it depends on have been executed or are being executed. Whenever an instruction is ready and a required class of resource is available, the instruction is picked for execution.
If instructions are scheduled in strict program order, i.e. considered for scheduling only when all its predecessors have been executed, it is called an in-order execution. However, there could be long execution latency associated with instructions preforming memory access, complex arithmetic operation, etc. Hence in an in-order superscalar processor, a lengthy instruction prevents all subsequent instructions from being scheduled even though some of them are ready and there are resources available. This results in poor utilization of resources. An out-of-order processor allows instructions to be scheduled out of strict program order. If an instruction is ready and a resource is available, then such an instruction is scheduled ahead of its predecessors, which may be waiting for an appropriate resource, for example.
In an out-of order instruction shelf, instructions are scheduled or dispatched irrespective of the arrival order of instruction. The order of arrival of instruction to the shelf is always in program order. In order to schedule an instruction to a particular resource, the scheduler needs to choose one instruction from a set of possibly many ready instructions requiring that class of resource. Thus the scheduling approach may be based on a priority previously assigned to or associated with each instruction. Also in some implementations the scheduling approach may be to randomly choose an instruction from among the ready instructions. In the former case, one way of prioritizing the instructions could be prioritizing the age of the instruction, i.e. if two instruction are ready, the one which arrived earlier to the instruction shelf is chosen over the one which arrived later.
In an in-order processor, age priority could be inherently built in if the instruction shelf structure is a First in First out (FIFO) memory. The implementation of a FIFO is a group of registers, each capable of storing an instruction. The instructions fill the FIFO from the bottom to the top. Instructions are dispatched only from the bottom most slots and subsequently, the empty slots created in the bottom are filled by shifting the contents of the registers above them. The fact that older instructions are below younger instructions in the shelf, together with the approach that an instruction is dispatched only from the bottom of the shelf, enforces age priority among the instructions in the shelf.
In an out-of-order instruction shelf, implicit priority cannot be achieved through a pure FIFO. That is because an instruction could be dispatched from any location or slot of an out-of-order shelf. Subsequently a xe2x80x9cholexe2x80x9d is created which may be filled by a newly arriving instruction. By filling a hole with a newly arriving instruction, the physical bottom-to-top priority order in the shelf would be eliminated. To prevent the order from being eliminated, the holes can be xe2x80x9ccollapsedxe2x80x9d, which means that the instructions from the slots immediately above the hole can be shifted down to fill it (FIGS. 1 and 2). The new instructions, in that case, enter the shelf from the top and thus the order is maintained. The collapsing method is the most widely used out-of-order instruction shelf approach. The collapsing method maintains fairness in scheduling, but has some severe implementation bottlenecks which impede it from operating at a very high clock frequency.
An example case is used here to demonstrate the difficulty. Consider a shelf of depth N=32 and number of resources or Dispatch Width W=4. Every cycle each slot of the shelf needs to determine how many holes are created beneath it (could be 0 to 4) and should shift down by that many slots. That amounts to individually ADDing all the valid-bits (inverted) of the slots below the entry in question. For the top most entry it will be a 31 wide 1-bit ADDer, which includes a 2-bit ADDer followed by 4 stages of 3-bit ADDers knowing that the result cannot be greater than 4. Noting that a 2-bit ADDER comprises 2 gate levels and a 3-bit ADDer comprises 3 gate levels, the sum of gate levels can be computed to be 2+4*3=14 after optimization.
Collapse logic is not only lengthy, another disadvantage is that it adds up to every other pipelined operation being done on the shelf. This is because the entries are susceptible to down-shifting every cycle. So information regarding an entry computed in a cycle cannot be registered in the slot currently occupied by the entry, rather it needs to be registered in the slot the entry is set to move to. For instance when an entry is found to be eligible by the scheduler, the scheduler needs to note the shift amount and mark xe2x80x9cscheduledxe2x80x9d the slot, where the entry will be moving to. In the subsequent cycle the slot marked xe2x80x9cscheduledxe2x80x9d is despatched. Schedule and shift, both lengthy operations, are serialized, and thus severely limit the cycle speed.
An example of a conventional superscalar processor is described in U.S. Pat. No. 5,896,542 to Iadonato et al. which includes a tag monitor system for assigning and storing tags for multiple instructions. The tag monitor system includes a tag FIFO for arranging respective tags in the same program order as the instructions.
In view of the foregoing background, it is therefore an object of the invention to increase the instruction dispatch speed in a superscalar microprocessor having an out-of-order instruction shelf.
This and other objects, features and advantages in accordance with the present invention are provided by a microprocessor including a plurality of resources for executing instructions, and an out-of-order instruction shelf. The instruction shelf has an instruction pool with a plurality of slots therein for storing respective instructions, and an instruction age tracker for storing therein a matrix of rows and columns of logic states associated with relative ages of instructions. The logic states in a given column and row of the matrix are associated with a respective slot of the instruction pool. Also, the microprocessor includes an instructions scheduler for performing at least one logic function on each column of the matrix to determine an oldest instruction, for dispatching instructions to the plurality of resources based thereon, and for updating the matrix based upon dispatched instructions.
The instruction age tracker may comprise a plurality of single-bit registers which define the matrix, and each slot of the instruction pool may comprise a register for storing an instruction and instruction dependency information. Also, each column of logic states of the matrix preferably defines a priority tag for a corresponding slot of the instruction pool, while each logic state may comprise at least one of first and second binary states. The first binary state indicates the presence of an older instruction stored in another slot of the instruction pool. Preferably, each row of logic states of the matrix corresponds to one of the slots of the instruction pool, and the instruction scheduler sets a respective one of the rows to the second binary state when an instruction in a corresponding slot of the instruction pool is dispatched.
The out-of-order instruction shelf may further comprise a valid vector indicating whether each of the slots of the instruction pool includes an instruction. Furthermore, the scheduler may dispatch instructions and update the matrix within a clock cycle.
Objects, features and advantages in accordance with the present invention are also provided by a method of tracking instruction priority in an out-of-order instruction shelf of a microprocessor, including storing respective instructions in a plurality of slots of an instruction pool, and storing, in an instruction age tracker, a matrix of rows and columns of logic states associated with relative ages of instructions. Again, the logic states in a given column and row are associated with a respective slot of the instruction pool. Furthermore, the method includes performing a logic function on each column of the matrix to determine the relative ages of the instructions stored in respective slots of the instruction pool.
Instructions are dispatched based upon the relative ages, and the matrix is updated upon dispatching instructions. Preferably, the matrix is updated during a same clock cycle that instructions are dispatched.