1. Field of the Invention
This invention is related to the field of superscalar microprocessors and, more particularly, to retiring instructions from reorder buffers within superscalar microprocessors.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term "instruction processing pipeline" is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
In order to increase performance, superscalar microprocessors often employ out of order execution. The instructions within a program are ordered, such that a first instruction is intended to be executed before a second instruction, etc. When the instructions are executed in the order specified, the intended functionality of the program is realized. However, instructions may be executed in any order as long as the original functionality is maintained. For example, a second instruction which does not depend upon a first instruction may be executed prior to the first instruction, even if the first instruction is prior to the second instruction in program order. A second instruction depends upon a first instruction if a result produced by the first instruction is employed as an operand of the second instruction. The second instruction is said to have a dependency upon the first instruction.
Another hazard of out of order execution occurs when two instructions update the same destination storage location. If the instruction which is second in the original program sequence executes first, then that instruction must not update the destination until the first instruction has executed. Often, superscalar microprocessors employ a reorder buffer in order to correctly handle dependency checking and multiple updates to a destination, among other things. Instructions are stored into the reorder buffer in program order, typically as the instructions are dispatched to execution units (perhaps being stored in reservation stations associated therewith).
The results of the instructions are stored into the destinations from the reorder buffer in program order. However, results may be provided to the reorder buffer in any order. The reorder buffer stores each result with the instruction which generated the result until that instruction is retired, i.e., selected for storing its result into the destination.
A reorder buffer is configured to store a finite number of instructions, defining a maximum number of instructions which may be concurrently outstanding within the superscalar microprocessor. Generally speaking, out of order execution occurs more frequently as the finite number is increased. For example, the execution of an instruction which is foremost within the reorder buffer in program order may be delayed. Instructions subsequently dispatched into the reorder buffer which are not dependent upon the delayed instruction may execute and store results in the buffer. Out of order execution may continue until the reorder buffer becomes full, at which point dispatch is suspended until instructions are retired from the reorder buffer. Therefore, a larger number of storage locations within the reorder buffer generally leads to increased performance by allowing more instructions to be outstanding before instruction dispatch (and out of order execution) stalls.
Unfortunately, larger reorder buffers complicate dependency checking. One or more source operands of an instruction to be dispatched may be destination operands of outstanding instructions within the reorder buffer. As used herein, a source operand of an instruction is a value to be operated upon by the instruction in order to produce a result. Conversely, a destination operand is the result of the instruction. Source and destination operands of an instruction are generally referred to as operand information. An instruction specifies the location storing the source operands and the location in which to store the destination operand. An operand may be stored in a register (a "register operand") or a memory location (a "memory operand"). As used herein, a register is a storage location included within the microprocessor which is used to store instruction results. Registers may be specified as source or destination storage locations for an instruction. A register file is a storage device for a plurality of registers with ports for writing to and reading from the registers.
The locations from which to retrieve source operands for an instruction to be dispatched are compared to the locations designated for storing destination operands of instructions stored within the reorder buffer. If a dependency is detected and the corresponding instruction has executed, the result stored in the reorder buffer may be forwarded for use by the dispatching instruction. If the instruction has not yet executed, a tag identifying the instruction may be forwarded such that the result may be provided when the instruction is executed.
When the number of instructions storable in the reorder buffer is large, the number of comparisons for performing dependency checking is also large. Generally speaking, the total number of comparisons which must be provided for is the number of possible operands of an instruction multiplied by the number of instructions which may be concurrently dispatched, further multiplied by the number of instructions which may be stored in the reorder buffer. As the number of instructions increases, the amount of circuitry employed to perform the comparisons increases dramatically. The large amount of circuitry may undesirably increase the total amount of circuitry employed by the microprocessor. Additionally, more than one destination operand within the reorder buffer may be stored within the storage location indicated for a source operand. Circuitry is therefore employed to detect the last of the destination operands indicated by the comparisons, in order to correctly detect the dependency (i.e. the instruction which stores a result into a storage location used for a source operand and which is nearest to the dispatching instruction in program order is the instruction upon which the dispatching instruction depends).
After instructions in the reorder buffer are executed and results are obtained, the instructions are ready to be retired. When an instruction is retired the result of the instruction is stored to the location specified by the destination operand, and the storage space within the reorder buffer allocated to the instruction is deallocated. This allows new instructions to be dispatched and stored in the reorder buffer.
The oldest instructions in the reorder buffer, i.e. the ones that are first in the original program sequence, are monitored for retirement. In a typical retirement cycle, a plurality of the oldest instructions are monitored for retirement. The number of instructions monitored for retirement is called a retire window. Instructions within the retire window that have executed and the results obtained are retired. In the next retire cycle, the retire window monitors the oldest remaining instructions in the reorder buffer.
The reorder buffer has a fixed number of write ports to store data to a register file. The retire window of the reorder buffer typically monitors as many instructions as there are register file write ports. If two instructions retired during one retire cycle have the same register destination operand, only the data from the "newer instruction" (i.e. later in the original program sequence) is actually stored to the register file. In this situation, one of the register ports is not used.
Failure to fully utilize all the write ports can delay the deallocation of memory space within the reorder buffer. Therefore, the reorder buffer must be larger to prevent the reorder buffer from becoming full and stalling instruction dispatch. What is desired is a reorder buffer that makes maximum utilization of the available write ports.