1. Technical Field
The present invention is directed generally to processors. More specifically, the invention is directed to an apparatus, system and method of quickly determining and selecting an oldest instruction in a non-moving instruction queue of a processor for issuance.
2. Description of Related Art
Most modern processors are super-scalar processors. A super-scalar processor is a processor that has multiple execution units for simultaneously processing multiple instructions. Generally, a super-scalar processor executes an instruction using a plurality of stages. These stages include: fetch, decode, dispatch, issue/execute, retirement and write-back.
In the fetch stage, instructions are loaded from memory into the processor for execution. However, since accessing data from a system memory is very slow by comparison to the execution speed of a processor, this stage does not involve a direct read from the memory. Rather, a special control circuitry loads larger blocks (16 or 32 bytes) of instruction data from the memory into a primary instruction cache. This data may then be made available for rapid feeding to the execution units as needed.
In the decode stage, a loaded instruction is examined to determine whether it should be divided into micro-instructions. As can be surmised, the amount of time it takes to decode an instruction depends on the complexity of the instruction. Simple instructions may be decoded at the rate of several per clock cycle, while more complicated instructions may take more than a cycle each. Any addresses required in memory are also generated at this stage.
In the dispatch stage, each micro-instruction or instruction is dispatched to an instruction pool or queue, where it awaits assignment to an execution unit. Internal circuitry is used to optimize this task and control which instruction or micro-instruction goes to which execution unit. This is sometimes called instruction scheduling, since tasks (micro-instructions) are assigned to available resources (execution units). Note that in order to simplify the rest of the disclosure, micro-instructions and instructions will henceforth be used interchangeably.
In the issue/execute stage, an instruction is issued to an execution unit for execution. Since multiple execution units are normally used in a super-scalar processor, some of the execution units may be dedicated to execute specific instructions. For example, complex floating-point operations are typically handled by floating point execution units. Consequently, the instructions may be executed independently and in an out-of-order fashion.
Therefore, to ensure the results of the executions of the instructions remain in their original order, they are stored in temporary locations. This allows a retirement unit to collect the results from the instructions and ensures that the output is produced correctly and in accordance with the intent of the original instructions. This occurs at the retirement stage.
In the write-back stage, the results from the execution units are written back either to an internal register or to the system memory. Again, since accessing the system memory is a rather slow process by comparison to the speed of the processor, the result is first written into a write buffer, where it is held until it can be written into the system memory.
To properly and fairly schedule the instructions in the instruction queue for issuance, the scheduler ordinarily uses one of a plurality of algorithms. These algorithms include first-in, first-out (FIFO) algorithm, last-in, first-out (LIFO) algorithm etc. In the case where the FIFO algorithm is used, the oldest instruction in the instruction queue is issued before any other one is. This generally entails that the oldest instruction be known.
Since instructions in a super-scalar processor may be executed in an out-of-order fashion, the oldest instruction in the queue may not always be known, especially when a non-moving instruction queue is in use. (In a non-moving instruction queue, new instructions are placed in any empty or available locations in the queue.) Consequently, before issuing an instruction to an execution unit, a search for the oldest instruction in the queue is generally performed.
Searching the queue for the oldest instruction is a time-consuming endeavor, which is ill-suited for high-frequency processors. Thus, moving instruction queues are typically employed in high-frequency processors. (In a moving instruction queue, the queue is compressed every cycle to ensure that empty or available locations are always at one end (e.g., at the top) of the queue.) Thus, new instructions are generally dispatched to the top of the queue. This, then, ascertains that the bottom-most instruction is always the oldest instruction in the queue.
However, compressing the queue at every cycle may consume more power than is required when searching for the oldest instruction in a queue. And, as is well known in the field, power consumption equates largely to heat generation, which degrades performance.
Thus, what is needed is an apparatus, system and method to determine quickly an oldest instruction in a non-moving instruction queue.