1. Technical Field
The present invention relates generally to computer processing and, in particular, to the simultaneous finish of stores and dependent loads.
2. Description of the Related Art
A common problem found in high performance microprocessor designs is detecting and handling load address dependencies, and in particular, load and store memory address conflicts. Generally, a load and store memory address conflict occurs when a load instruction follows a store instruction directed to the same memory address, and the store instruction has not yet been committed to memory or otherwise cleared.
In an out-of-order (OOO) execution, a processor issues a load instruction before issuing a store instruction that appears earlier than the load instruction in program order. This reordering is a common optimization used in many processors to improve performance by hiding the load latencies.
However, when more than one instruction references a particular location for an operand, either reading the operand as an input or writing the operand as an output, executing such instructions in an order different from the original program order can lead to various data problems. For example, one such data problem is known as a “read-after-write” (RAW). A read after write (RAW) data problem refers to the situation where an instruction refers to a result that has not yet been calculated or retrieved. Thus, a read-after-write refers to the situation where a read from a register or memory location must return the value placed there by the last write in program order, and not some other write. The preceding condition implicated by a read-after-write is referred to as a true dependency, and typically requires the instructions to execute in program order to avoid the problem. In such a case, the load is considered to be dependent on the write (store), and is referred to as a dependent load.
Thus, in general, a load and store memory address conflict occurs when a load instruction follows a store instruction directed to the same memory address, and the store instruction has not yet been committed to memory or otherwise cleared. A load and store memory address conflict is typically referred to as a “load-hit-store” condition.
FIG. 1 shows the common case scenario 100 for a load instruction 121 issued before a store instruction 111 that appears earlier than the load instruction 121 in program order. The load instruction 121 is processed in a load pipeline 120, and the store instruction 111 is processed in a store pipeline 110.
Hence, consider the following behavior of the given load instruction 121 of FIG. 1: in a processor that cracks a store instruction into a data store and a store address generation, the load instruction 121 checks the availability of a value stored on a load address, is rejected and reissued (multiple times) because the stored value is not ready, and eventually reads the stored value when it is ready. That is, the load instruction 121 is initially rejected 131 (due to the stored value not being ready) at time T, and ultimately reissued 133 to read a value (the stored value) at time T+Ta+Tpenalty, as described in further detail herein below, where time T+Ta is the time when the store value is actually ready, that is, the store instruction has completed.
The preceding behavior causes at least two performance problems. One performance problem is that such a load suffers from extra penalties (Tpenalty in FIG. 1) because it does not read a value soon after the store is finished (at time T+Ta). Another performance problem is that the pipelines 110 and 120 and the instruction issue bandwidth are wasted by the repeated reissues of the load instruction 121 that are rejected.
Such load instructions are observed in the real-world programs that save data in memory and read the data in short periods. Typical examples are the byte code interpreters (e.g., RUBY), which keep updating the control variables (e.g., the stack pointer) and the stack entries in the memory.