1. Field of the Invention
The present invention relates to the field of computers. More specifically, the present invention relates to computer architecture.
2. Description of the Related Art
The phenomenon of a load operation accessing a memory location that has been modified by a store operation is commonly referred to as a memory Read-after-Write (RAW) data hazard, or memory RAW aliasing. Memory RAW aliasing occurs between a significant percentage of load operations and respective store operations. There are a variety of reasons for the common occurrence of memory RAW aliasing in many applications, including register pressure, pointer disambiguation, parameter passing, and integer to floating point moves.
Until recently, many processors have not provided an instruction to move data directly from an integer register to a FP (floating point) register (and vice versa). In the absence of directly moving data between these registers, data is stored to memory and then reloaded, thus introducing memory RAW aliasing. While FP to integer and integer to FP move functionality is now available on many modern processors, many legacy codes do not take advantage of this new functionality. In addition, unless all processors across a product line support the integer to FP and FP to integer moves, generic applications may not be able to explicitly leverage the new move instructions.
The need to frequently store and reload data from/to the registers to/from memory can have a detrimental impact on performance as the latency of RAW bypassing through memory is very high. While typical level-1 cache hit latencies are only 1 to 3 cycles, the bypass of a store value to an aliasing load can take up to an order of magnitude longer.
In conventional processors, store operations first write into a store queue/store buffer (SQ/SB). Load operations check the store buffer in parallel with the data cache. If the store buffer has the requested memory location, the load value is retrieved from the store buffer. The latency of accessing the store buffer is often larger than that of accessing the level-1 cache. Hence, a stale value from the cache may be used in operations dependent on the load operations before the signal arrives from the store buffer indicating memory RAW aliasing. This situation is more likely when the separation in cycles between the store operation and the load operation is small, since the cache is likely not to have updated yet. In this case, the load mis-speculation is corrected by reissuing the load operation and its dependents, with the correct value from the store buffer.
Since the cost of such mis-speculation typically exceeds 20 cycles in conventional processors, conventional processors may use mechanisms to detect that certain store operations and load operations are likely to alias. If a load operation aliases repeatedly with a store operation on many dynamic executions, the load operation and the store operation may, for instance, be tagged in the instruction cache. On subsequent executions, tagged load operations are not permitted to issue until tagged store operations have retired. Thus, the processor does not permit certain load operations to speculate past certain store operations, while permitting the rest of the load operations to freely speculate past store operations.
Some conventional out-of-order processors permit a load operation to be issued even before an aliasing store operation writes into the store buffer. When such a processor executes a load operation before an aliasing store operation writes into the store buffer, the processor cannot detect that the load operation aliases with an older, as yet, unissued store operation. When the store operation issues, these processors determine if a younger load operation was issued earlier than an older aliasing store operation. Subsequently, the processor reissues the younger load operation after the store operation has written into the store buffer. In these processors, the load address and associated information is kept in a load queue (LQ). Sometimes the load queue is combined with the store buffer into a single structure that is commonly called the Load Store Queue (LSQ). Stores check the LSQ and detect younger loads with a matching address and cause them to reissue.
Recovering from mis-speculation and re-issuing instructions complicates processor design. Less complex approaches have also been investigated and utilized. A store operation may be split into two parts, the address generation part and the actual store. Younger load operations wait until the address generation part of the store operation completes, at which point the processor allows the load operation to issue, unless its address matches with the older, as yet, unissued store operations address.
Since the majority of load operations do not alias with older store operations, it is advantageous for processors to allow most loads to speculate. The previously described mechanisms may restrict speculation for certain load operations. When load operations do alias however, these mechanisms can require the load operation to reissue or wait until the aliasing store operation is retired or written into the store buffer. Improved techniques are desired.