1. Field of the Invention
This invention is related to the field of superscalar microprocessors and, more particularly, to reorder buffers within superscalar microprocessors.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term xe2x80x9cclock cyclexe2x80x9d refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term xe2x80x9cinstruction processing pipelinexe2x80x9d is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
In order to increase performance, superscalar microprocessors often employ out of order execution. The instructions within a program are ordered, such that a first instruction is intended to be executed before a second instruction, etc. When the instructions are executed in the order specified, the intended functionality of the program is realized. However, instructions may be executed in any order as long as the original functionality is maintained. For example, a second instruction which does not depend upon a first instruction may be executed prior to the first instruction, even if the first instruction is prior to the second instruction in program order. A second instruction depends upon a first instruction if a result produced by the first instruction is employed as an operand of the second instruction. The second instruction is said to have a dependency upon the first instruction.
Another hazard of out of order execution occurs when two instructions update the same destination storage location. If the instruction which is second in the original program sequence executes first, then that instruction must not update the destination until the first instruction has executed. Often, superscalar microprocessors employ a reorder buffer in order to correctly handle dependency checking and multiple updates to a destination, among other things. Instructions are stored into the reorder buffer in program order, typically as the instructions are dispatched to execution units (perhaps being stored in reservation stations associated therewith) The results of the instructions are stored into the destinations from the reorder buffer in program order. However, results may be provided to the reorder buffer in any order. The reorder buffer stores each result with the instruction which generated the result until that instruction is selected for storing its result into the destination.
A reorder buffer is configured to store a finite number of instructions, defining a maximum number of instructions which may be concurrently outstanding within the superscalar microprocessor. Generally speaking, out of order execution occurs more frequently as the finite number is increased. For example, the execution of an instruction which is foremost within the reorder buffer in program order may be delayed. Instructions subsequently dispatched into the reorder buffer which are not dependent upon the delayed instruction may execute and store results in the buffer. Out of order execution may continue until the reorder buffer becomes full, at which point dispatch is suspended until instructions are deleted from the reorder buffer. Therefore, a larger number of storage locations within the reorder buffer generally leads to increased performance by allowing more instructions to be outstanding before instruction dispatch (and out of order execution) stalls.
Unfortunately, larger reorder buffers complicate dependency checking. One or more source operands of an instruction to be dispatched may be destination operands of outstanding instructions within the reorder buffer. As used herein, a source operand of an instruction is a value to be operated upon by the instruction in order to produce a result. Conversely, a destination operand is the result of the instruction. Source and destination operands of an instruction are generally referred to as operand information. An instruction specifies the location storing the source operands and the location in which to store the destination operand. An operand may be stored in a register (a xe2x80x9cregister operandxe2x80x9d) or a memory location (a xe2x80x9cmemory operandxe2x80x9d). As used herein, a register is a storage location included within the microprocessor which is used to store instruction results. Registers may be specified as source or destination storage locations for an instruction.
The locations from which to retrieve source operands for an instruction to be dispatched are compared to the locations designated for storing destination operands of instructions stored within the reorder buffer. If a dependency is detected and the corresponding instruction has executed, the result stored in the reorder buffer may be forwarded for use by the dispatching instruction. If the instruction has not yet executed, a tag identifying the instruction may be forwarded such that the result may be provided when the instruction is executed.
When the number of instructions storable in the reorder buffer is large, the number of comparisons for performing dependency checking is also large. Generally speaking, the total number of comparisons which must be provided for is the number of possible operands of an instruction multiplied by the number of instructions which may be concurrently dispatched, further multiplied by the number of instructions which may be stored in the reorder buffer. Additionally, more than one destination operand within the reorder buffer may be stored within the storage location indicated for a source operand. Circuitry is therefore employed to detect the last of the destination operands indicated by the comparisons, in order to correctly detect the dependency (i.e. the instruction which stores a result into a storage location used for a source operand and which is nearest to the dispatching instruction in program order is the instruction upon which the dispatching instruction depends) It is desirable to reduce the complexity of dependency checking for reorder buffers.
Still further, reorder buffers typically allocate a storage location for each instruction dispatched during a particular clock cycle. The number of storage locations allocated varies from clock cycle to clock cycle depending upon the number of instructions dispatched. Additionally, a variable number of instructions may be retired from the reorder buffer. Logic for allocating and deallocating storage locations is complicated by the variable nature of storage access, creating a larger and typically slower control unit used in the reorder buffer. A faster, simpler method for allocating reorder buffer storage is desired.
The problems outlined above are in large part solved by a reorder buffer in accordance with the present invention. The reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results and information regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. Advantageously, the amount of storage allocated is constant. Therefore, storage allocation logic depends only upon whether or not instructions are dispatched during a clock cycle. In particular, allocation logic is independent of the number of instructions dispatched during a clock cycle. Allocation logic may thereby be simplified, allowing for higher frequency applications.
Similarly, instructions are retired from the reorder buffer after each of the instructions within a line of storage have provided results. The instructions within the line are retired simultaneously. Therefore, the amount of storage deallocated during a clock cycle is dependent only upon whether or not instructions are retired during the clock cycle, not upon the number of instructions retired. Advantageously, storage deallocation logic may be simplified as well.
In one embodiment, a microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. Since the issue positions are symmetrical, any random group of instructions executable by the issue positions may be dispatched to the issue positions. In contrast, asymmetrical issue positions may impose additional restrictions upon the concurrent dispatch and execution of instructions. Increasing the average number of concurrently dispatched instructions may be particularly beneficial when employed with the line-oriented reorder buffer, since a line of storage is allocated regardless of the number of instructions dispatched. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases.
One particular implementation of the reorder buffer includes a future file for reducing dependency checking complexity. The future file replaces the large block of comparators and prioritization logic ordinarily employed by reorder buffers for dependency checking. The future file includes a storage location corresponding to each register within the microprocessor. The reorder buffer tag (or instruction result, if the instruction has executed) of the last instruction in program order to update the register is stored in the future file. The reorder buffer provides the value (either reorder buffer tag or instruction result) stored in the storage location corresponding to a register when the register is used as a source operand for another instruction.
Another advantage of the future file for microprocessors which allow access and update to portions of registers is that narrow-to-wide dependencies are resolved upon completion of the instruction which updates the narrower register. Previously, the instruction which updates the narrower register might typically be retired prior to resolving the narrow-to-wide dependency. Generally, instruction retirement occurs subsequent to completion of the instruction. Performance of the microprocessor may be increased due to the earlier resolution of the narrow-to-wide dependencies.
Broadly speaking, the present invention contemplates a reorder buffer comprising an instruction storage and a first control unit. The instruction storage includes multiple lines of storage, wherein each of the lines of storage is configured to store a predefined maximum number of instructions concurrently receivable by the reorder buffer. Coupled to the instruction storage, the first control unit is configured to allocate one of the lines of storage to one or more concurrently received instructions. One of the lines of storage is allocated regardless of a number of the concurrently received instructions.
The present invention further contemplates an apparatus for reordering instructions which were executed out of order, comprising a first decode unit, a second decode unit, and a reorder buffer. The first decode unit is configured to decode and dispatch a first instruction. Similarly, the second decode unit is configured to decode and dispatch a second instruction concurrent with the first instruction. Coupled to both the first decode unit and the second decode unit, the reorder buffer is configured to allocate a line of storage to store instruction results corresponding to the first instruction and the second instruction upon dispatch of the first instruction and the second instruction. The line of storage is configured to store a maximum number of concurrently dispatchable instructions and is allocated regardless of a number of concurrently dispatched instructions provided at least one instruction is dispatched.
The present invention still further contemplates a method for operating a reorder buffer. Up to a predefined maximum number of concurrently dispatched instructions are received into the reorder buffer. Upon receipt of the concurrently dispatched instructions, a fixed amount of storage is allocated for instruction results. The fixed amount of storage is sufficient to store the maximum number of concurrently dispatched instructions regardless of a number of concurrently dispatched instructions. The fixed amount of storage is subsequently deallocated upon receipt of an instruction result corresponding to each of the concurrently dispatched instructions.
The present invention additionally contemplates a method for ordering instructions in a microprocessor employing out of order execution. Up to a maximum number of instructions are concurrently dispatched. A line of storage is allocated within a reorder buffer for storing instruction results corresponding to the instructions which are concurrently dispatched. The line of storage is configured to store a number of instruction results equal to the maximum number of instructions. The instructions are executed in a plurality of functional units. Upon execution, corresponding instruction results are provided to the reorder buffer. The line of storage is deallocated when each of the corresponding instruction results within the line of storage have been provided.
The present invention also contemplates a superscalar microprocessor comprising a plurality of fixed, symmetrical issue positions and a reorder buffer. The plurality of fixed, symmetrical issue positions is coupled to receive instructions. An instruction received by one of the plurality of issue positions remains within that one of the plurality of issue positions until the instruction is executed therein. Coupled to receive operand information regarding a plurality of concurrently dispatched instructions from the plurality of fixed, symmetrical issue positions, the reorder buffer is configured to allocate storage for instruction results corresponding to the plurality of concurrently dispatched instructions.
Furthermore, the present invention contemplates a superscalar microprocessor comprising a first and second decode unit, a first and second reservation station, and a reorder buffer. The first decode unit is configured to decode a first instruction. Similarly, the second decode unit is configured to decode a second instruction concurrently with the first decode unit decoding the first instruction. Coupled to receive the first instruction from the first decode unit, the first reservation station is configured to store the first instruction until the first instruction is executed. Likewise, the second reservation station is coupled to receive the second instruction from the second decode unit and to store the second instruction until the second instruction is executed. The reorder buffer is coupled to the first decode unit and the second decode unit, and receives an indication of the first instruction and the second instruction from the first decode unit and the second decode unit, respectively. Additionally, the reorder buffer is configured to allocate a line of storage to store a first instruction result corresponding to the first instruction and a second instruction result corresponding to the second instruction. The line of storage comprises a fixed amount of storage capable of storing instruction results corresponding to a maximum number of concurrently dispatchable instructions.
Moreover, the present invention contemplates a reorder buffer comprising an instruction storage, a future file, and a control unit. The instruction storage is configured to store instruction results corresponding to instructions. The instruction results are stored in lines of storage, wherein a line of storage is configured to store instruction results corresponding to a maximum number of concurrently dispatchable instructions. A first line of storage is allocated upon dispatch of at least one instruction regardless of a number of instructions concurrently dispatched. The future file is configured to store a reorder buffer tag corresponding to a particular instruction. The particular instruction is last, in program order, of the instructions represented within the instruction storage having a particular register as a destination operand. Additionally, the future file is further configured to store a particular instruction result corresponding to the particular instruction when the particular instruction result is provided. Coupled to the instruction storage and the future file, the control unit is configured to allocate the first line of storage for at least one instruction. Still further, the control unit is configured to update the future file if the instruction has the particular register as a destination operand.
The present invention yet further contemplates a reorder buffer comprising a future file, an instruction storage, and a control unit. The future file has a storage location for each register implemented by a microprocessor employing the reorder buffer. The storage location is divided into a first portion and a second portion, corresponding to a first portion and a second portion of the register, respectively. Each of the first and second portions of the storage location is configured to store a reorder buffer tag of an instruction which updates the corresponding portion the register. Additionally, each of the first and second portions of the storage location is configured to store data corresponding to an instruction result of the instruction identified by the reorder buffer tag, wherein the data replaces the reorder buffer tag when the data is provided. The instruction storage is configured to store instruction results corresponding to multiple instructions outstanding within the microprocessor. Coupled to the future file and the instruction storage, the control unit is configured to allocate storage within the instruction storage upon dispatch of at least one instruction. Still further, the control unit is configured to store a first reorder buffer tag into the first portion of the storage location if at least one instruction updates the first portion of the register. Similarly, the control unit is further configured to store the first reorder buffer tag into the second portion of the storage location if at least one instruction updates the second portion of the register.
The present invention additionally contemplates a microprocessor comprising a reorder buffer and a register file. The reorder buffer includes a future file having a storage location for each register implemented by the microprocessor. The storage location is divided into a first portion and a second portion, wherein the first portion of the storage location corresponds to a first portion of the register and the second portion of the storage location corresponds to a second portion of the register. Each of the first portion of the storage location and the second portion of the storage location is configured to store a reorder buffer tag of an instruction which updates the first portion of the register and the second portion of the register, respectively. Still further, each of the first and second portions of the storage location is configured to store data corresponding to an instruction result of the instruction identified by the reorder buffer tag, respectively. The data replaces the reorder buffer tag when the data is provided. Coupled to the reorder buffer, the register file is configured to store a plurality of values corresponding to the registers implemented by the microprocessor. The reorder buffer updates the register file upon retirement of the instruction.