The present disclosure relates generally to processors that execute instructions speculatively out of program order, and specifically to scheduling address-dependent memory instructions within such processors.
Processors execute programs which are typically represented as ordered sequences of instructions. A technique for increasing the number of instructions executed per clock cycle involves executing instructions speculatively out of program order. Out of order execution may increase performance of the processor. For example, a processor that executes instructions in order may experience delays when an instruction cannot complete execution because the instruction is waiting on data from memory. In out of order execution, instructions may be executed in a different order than that specified in the program sequence. Thus, other instructions may be executed while an instruction is waiting for data from memory. In an out-of-order processor with multiple execution units, the processor may issue and execute multiple instructions per clock cycle. Out of order execution may allow the execution units to operate in parallel, thereby increasing the number of instructions executed concurrently.
When scheduling instructions for execution, a processor exploiting out of order execution is typically constrained by dependencies between instructions, which dependencies may prohibit their concurrent execution. A second instruction depends upon a first instruction if the first instruction must be executed before the second instruction, e.g., if a result produced by the first instruction is employed as a source operand of the second instruction. In this case, the second instruction is said to have a dependency upon the first instruction. Other types of memory dependencies may exist in an out-of-order processor.
For example, it is desirable to execute younger load memory operations (also referred to as loads) prior to older store memory operations (also referred to as stores) to increase performance of the processor because the loads provide operands for execution of dependent instructions, and thus executing the loads allows for other instructions to be executed. A first operation is “older” than a second operation if the first operation is prior to the second operation in program order. On the other hand, a first operation is “younger” than a second operation if the first operation is subsequent to the second operation in program order. If the younger loads have no dependency on the older stores, the younger loads need not wait for the execution of the older stores. However, in some cases, a load memory operation may depend on an older store memory operation, e.g., the store memory operation updates at least one byte accessed by the load memory operation. In such cases, the load memory operation is incorrectly executed if executed prior to the store memory operation.
The dependency between the load and the store is typically not known until the memory addresses are calculated during execution of the store and the load. Because the memory addresses are not known prior to execution, it is not possible to determine with certainty whether a dependency exists between the load and the store. To avoid incorrectly executing the load prior to the store, a processor may execute the store and the load in program order. However, if there is no actual dependency between the load and the store, performance may be lost due to the delayed execution of the load, which may cause the delayed execution of instructions that are dependent on the load.
Because it is desirable to be as aggressive as possible in scheduling instructions out of order in an attempt to maximize performance, a processor may allow for the younger load to execute prior to the older store with little regard for the actual order of the instructions and then recover from incorrect execution of the load when the processor detects that the load is dependent on the store. For example, the load may be “replayed” by cancelling its current execution and reexecuting it at a later time. Unfortunately, incorrectly executing the load out of order and taking subsequent corrective actions to achieve correct execution may reduce performance due to resources being consumed unnecessarily to execute the load, only to cancel it and wait for subsequent reexecution. Additionally, because the data retrieved by the load may be bypassed to operations dependent on the load, the entire pipeline of speculative instructions must be flushed and reexecuted, which causes a substantial reduction in performance.