1. Field of the Invention
This invention is related to the field of processors and, more particularly, to instruction scheduling mechanisms within processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by issuing and executing multiple instructions per clock cycle and by employing the highest possible clock frequency consistent with the design. One method for increasing the number of instructions executed per clock cycle is out of order execution. In out of order execution, instructions may be executed in a different order than that specified in the program sequence (or xe2x80x9cprogram orderxe2x80x9d). Certain instructions near each other in a program sequence may have dependencies which prohibit their concurrent execution, while subsequent instructions in the program sequence may not have dependencies on the previous instructions. Accordingly, out of order execution may increase performance of the superscalar processor by increasing the number of instructions executed concurrently (on the average). Another method related to out of order execution is speculative execution, in which instructions are executed subsequent to other instructions which may cause program execution to proceed down a different path than the path containing the speculative instructions. For example, instructions may be speculative if the instructions are subsequent to a particular instruction which may cause an exception. Instructions are also speculative if the instructions are subsequent to a predicted conditional branch instruction which has not yet been executed. Similarly, instructions may be out of order or speculatively scheduled, issued, etc.
Unfortunately, scheduling instructions for out of order or speculative execution presents additional hardware complexities for the processor. The term xe2x80x9cschedulingxe2x80x9d generally refers to selecting an instruction for execution. Typically, the processor attempts to schedule instructions as rapidly as possible to maximize the average instruction execution rate (e.g. by executing instructions out of order to deal with dependencies and hardware availability for various instruction types). These complexities may limit the clock frequency at which the processor may operate. In particular, the dependencies between instructions must be respected by the scheduling hardware. Generally, as used herein, the term xe2x80x9cdependencyxe2x80x9d refers to a relationship between a first instruction and a subsequent second instruction in program order which requires the execution of the first instruction prior to the execution of the second instruction. A variety of dependencies may be defined. For example, a source operand dependency occurs if a source operand of the second instruction is a destination operand of the first instruction.
Generally, instructions may have one or more source operands and one or more destination operands. The source operands are input values to be manipulated according to the instruction definition to produce one or more results (which are the destination operands). Source and destination operands may be memory operands stored in a memory location external to the processor, or may be register operands stored in register storage locations included within the processor. The instruction set architecture employed by the processor defines a number of architected registers. These registers are defined to exist by the instruction set architecture, and instructions may be coded to use the architected registers as source and destination operands. An instruction specifies a particular register as a source or destination operand via a register number (or register address) in an operand field of the instruction. The register number uniquely identifies the selected register among the architected registers. A source operand is identified by a source register number and a destination operand is identified by a destination register number.
In addition to operand dependencies, one or more types of ordering dependencies may be enforced by a processor. Ordering dependencies may be used, for example, to simplify the hardware employed or to generate correct program execution. By forcing certain instructions to be executed in order with respect to certain other instructions, hardware for handling consequences of the out of order execution of the instructions may be omitted. For example, instructions which update special registers containing general processor operating state may affect the execution of a variety of subsequent instructions which do not explicitly access the special registers. Generally, ordering dependencies may vary from microarchitecture to microarchitecture.
While the scheduling mechanism respects dependencies, it is desirable to be as aggressive as possible in scheduling instructions out of order and/or speculatively in an attempt to maximize the performance gain realized. For example, it may be desirable to schedule load memory operations prior to older store memory operations, since load memory operations more typically have dependent instructions. However, in some cases, a load memory operation may depend on an older store memory operation (e.g. the store memory operation updates at least one byte accessed by the load memory operation). In such cases, the load is incorrectly executed if executed prior to the store memory operation. A mechanism for allowing load memory operations to be scheduled prior to older store memory operations and for discovering and recovering from incorrect execution of a particular load memory operation prior to a particular older store memory operation is therefore desired.
Additionally, memory operations may experience additional conditions over and above the dependencies which may prevent correct execution. For example, memory operations often require additional resources to complete execution. For example, memory operations which miss a data cache within the processor may require a miss buffer entry to store the address of the memory operand for fetching from main memory. Load memory operations may have a memory operand updated by one or more stores in a store buffer, but the data may not be available or cannot be forwarded via the hardware associated with the store buffer. A scheduling mechanism which handles such situations is therefore desired.
The problems outlined above are in large part solved by a scheduler as described herein. The scheduler issues memory operations without regard to whether or not resources are available to handle each possible execution outcome of that memory operation. The scheduler also retains the memory operation after issuance. If a condition occurs which prevents correct execution of the memory operation, the memory operation is retried. The scheduler subsequently reschedules and reissues the memory operation in response to the retry. Advantageously memory operations may be aggressively scheduled and, if the memory operations do not complete execution, the memory operations are rescheduled again at a later point. Many memory operations may complete successfully during the initial issuance, and those memory operations which do not complete successfully are completed during a subsequent reissue (although some memory operations may be reissued multiple times before completing).
Additionally, in one embodiment, the scheduler may receive a retry type indicating the reason for retry. Certain retry types may indicate a delayed reissuance of the memory operation until the occurrence of a subsequent event. In response to such retry types, the scheduler monitors for the subsequent event and delays reissuance until the event is detected. For example, a load memory operation which misses the data cache is reissued to cause the memory operand to be stored into the destination operand. However, reissuance of the load memory operation is delayed until the fill data including the memory operand is being provided. Then, the load memory operation is reissued and may complete by receiving the fill data. As another example, a particular memory operation may be required to execute non-speculatively, and the determination of the requirement may occur during execution. The particular memory operation may be retried and may be inhibited from reissue until the particular memory operation becomes non-speculative.
In one particular embodiment, the scheduler issues load memory operations without regard to older, unissued store memory operations. In other words, older, unissued store memory operations do not prevent the scheduling of a load memory operation. The scheduler includes a physical address buffer which stores the physical addresses accessed by load memory operations, received by the scheduler during execution of the load memory operations. The scheduler also receives the store physical addresses corresponding to executing stores, and compares the store physical addresses to the load physical addresses in the physical address buffer. If the comparison indicates that the store memory operation updates at least one byte of the load memory operand and the store memory operation is older than the corresponding load memory operation, the corresponding load memory operation is reissued to receive the correct memory operand. Additionally, each dependent instruction operation is reissued to ensure that each dependent instruction operation is executed using the correct source operands.
In yet another particular embodiment, the scheduler includes a store tag buffer which receives an identifier of an older store memory operation which is determined, during the execution of a load memory operation, to update at least one byte of the load memory operand (e.g. by the load memory operation hitting the older store memory operation in a store buffer). The store tags of executing stores are compared to the tags in the store tag buffer to detect cases in which the older store memory operation is reissued. If a match is detected, the corresponding load memory operation is reissued as well. Advantageously, correct execution of the load memory operation is ensured in the cases in which the older store memory operation is reissued (and hence its result may change).
Broadly speaking, a scheduler is contemplated, comprising an instruction buffer configured to store a first memory operation, an issue pick circuit, an issue pick circuit configured to select the first memory operation for issue from the instruction buffer, and a control circuit coupled to the issue pick circuit. The control circuit is also coupled to receive a first signal indicating a retry condition for the first memory operation. The control circuit is configured to maintain a first execution state of the first memory operation, wherein the control circuit is configured to change the first execution state to an executing state responsive to the issue pick circuit selecting the first memory operation for issue, and wherein the control circuit is configured to change the first execution state to a not executed state responsive to the first signal.
Additionally, a processor is contemplated, comprising a scheduler and a load/store unit. The scheduler is configured to store a first memory operation and to select the first memory operation for issue. Additionally, the scheduler is configured to maintain a first execution state of the first memory operation, and is configured to change the first execution state to an executing state responsive to issuing the first memory operation for issue. The load/store unit is coupled to receive the first memory operation in response to issue thereof from the scheduler. The load/store unit is configured to detect a retry condition for the first memory operation and to assert a first signal in response to detecting the retry condition. In response to the first signal, the scheduler is configured to change the first execution state to a not executed state. Also, a computer system is contemplated including the processor and an input/output (I/O) device configured to communicate between the computer system and another computer system to which the I/O device is couplable.
Still further, a method is contemplated. A first memory operation is issued from a scheduler. The first memory operation is retained in the scheduler subsequent to the issuing. The first memory operation is reissued from the scheduler responsive to a retry condition corresponding to the first memory operation.
Moreover, a processor is contemplated, comprising a scheduler and a load/store unit. The scheduler is configured to store a first memory operation and to select the first memory operation for issue. Additionally, the scheduler is configured to retain the first memory operation subsequent to issuing the first memory operation. Coupled to receive the first memory operation in response to issue thereof from the scheduler, the load/store unit is configured to detect a retry condition for the first memory operation and to assert a first signal in response to detecting the retry condition. The scheduler is coupled to receive the first signal, and is configured to reissue the first memory operation responsive to the first signal.