1. Field of the Invention
This invention relates to the field of microprocessors and, more particularly, to reorder buffers used within microprocessors.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term "instruction processing pipeline" is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
Generally speaking, a given instruction has one or more source operands which are input values to be operated upon by the microprocessor in response to the given instruction. Each source operand is specified by the instruction via a source operand specifier. The source operand specifier identifies a storage location which stores the corresponding source operand. In the x86 microprocessor architecture, for example, a source operand may be stored in a register or a memory location. If a source operand is stored in a register, the source operand specifier identifies one of the registers defined for the instruction set. The identified register stores the source operand. Additionally, the given instruction typically has a destination operand. The destination operand is the result of the instruction. A destination operand is stored into a location specified by a destination operand specifier, similar to the source operand specifier. It is noted that operand specifiers are sometimes referred to as operand addresses.
In order to locate a larger number of instructions which may be concurrently executed, superscalar microprocessors often employ out of order execution. If instructions are executed in order (i.e. "program order", or the order of instructions as listed in the program sequence being executed), then the number of instructions which may be concurrently executed is limited by dependencies between the instructions. A dependency exists between a first instruction and a second instruction if the second instruction receives a value produced via execution of the first instruction (the "result" of the first instruction) as a source operand. Since the second instruction needs the result of the first instruction prior to executing, the first and second instructions cannot be concurrently executed. However, an instruction subsequent to the second instruction which does not depend upon either the first instruction or the second instruction may be concurrently executed with the first instruction.
Microprocessors which implement out of order execution often employ a reorder buffer for storing speculatively generated instruction results until the corresponding instructions become non-speculative. After the corresponding instructions become non-speculative, the instruction results may be moved from the reorder buffer to the storage location indicated by the destination operand specifier. Generally, a particular instruction becomes non-speculative when each of the instructions which may cause an exception and which are prior to the particular instruction in program order have executed and reported no exception. Often, reorder buffers are configured to store the instruction results into the destination storage locations (i.e. retire the instructions) in program order.
Because instruction results are held in the reorder buffer and the instruction results may be source operands for subsequent instructions, reorder buffers perform dependency checking between source operands of the subsequent instructions and the instructions represented within the reorder buffer. Dependency checking is performed in order to forward the source operands (or a reorder buffer tag which identifies an instruction result corresponding to that source operand if the instruction result has not yet been generated via the execution of a prior instruction) to the execution units which receive the subsequent instructions (or to the reservation stations associated with the execution units). If a reorder buffer tag is forwarded, the execution unit monitors instruction results provided to the reorder buffer to capture, as a source operand, the instruction result corresponding to that reorder buffer tag. Generally speaking, dependency checking comprises comparing source operands of instructions to destination operands stored in the reorder buffer. The dependency (or lack thereof) of a particular source operand is said to be resolved if the dependency check performed in a given clock cycle results in a communication of either the source operand, reorder buffer tag information, or a combination of operand values and tags to the execution unit (or reservation station) receiving the instruction having the particular source operand.
In some cases, the dependencies for a particular source operand may not be resolved upon presentation of the corresponding instruction to the reorder buffer. For example, in the x86 microprocessor architecture, the registers comprise multiple fields which may be used as an operand of an instruction. The EAX register contains an AX field comprising the least significant 16 bits of the EAX register; an AH field comprising the most significant 8 bits of the AX field; and an AL field comprising the least significant 8 bits of the AX field. The EAX register, the AX field, the AH field, or the AL field may be used as an operand of the instruction.
Unfortunately, if an instruction uses a particular field of a register as a destination operand and a subsequent instruction uses a field including the particular field and at least one other field, a "narrow to wide" dependency exists. Because the destination of the instruction forms part of the source operand of the subsequent instruction, a single reorder buffer tag or a single operand value is insufficient to describe the dependency. Typically, the receiving execution unit or reservation station is configured to receive one reorder buffer tag or one operand value from the reorder buffer for each source operand. Additional hardware would be needed within the receiving unit to receive a combination of reorder buffer tags and operand values for a particular source operand. Even more hardware would be needed to multiplex together the source operand from the multiple operand values received.
Typically, reorder buffers couple the storing of instructions therein with the forwarding of operands or tags for the instructions. If one or more unresolved dependencies are detected, the instructions being dispatched are stalled until the dependencies can be resolved. Often, the resulting structure is a reorder buffer having an instruction storage for storing instruction results and operand information for instructions which have been dispatched (and for which dependency information has been forwarded) as well as a set of storage locations for storing the reorder buffer inputs corresponding to a set of instructions attempting dispatch. If the set of instructions cannot be dispatched due to an unresolved dependency, the reorder buffer inputs are stored in the set of storage locations. In a subsequent clock cycle, the reorder buffer reattempts dependency checking and forwarding of the set of instructions. When all dependencies can be resolved, instruction information corresponding to the instructions is stored into the instruction storage and the dependency information is forwarded.
Unfortunately, the extra set of storage locations used to store stalled instructions occupies additional semiconductor die area which might advantageously be allocated to other functionality. Still further, the reorder buffer inputs must be multiplexed with those stored in the extra set of storage locations before dependency checking can be initiated. The amount of time within the clock cycle available to dependency checking is thereby decreased.