Field of the Invention
This invention relates to computing systems, and more particularly, to efficient predicting and processing of memory access dependencies.
Description of the Relevant Art
Modern microprocessors may include one or more processor cores, or processors, wherein each processor is capable of executing instructions of a software application. Ideally, every clock cycle produces useful execution of an instruction for each stage of a pipeline. However, a stall in a pipeline may cause no useful work to be performed during that particular pipeline stage. Stalls may occur due to at least cache misses and dependencies between instructions. To hide the penalty of stalls, microprocessors utilize out-of-order execution, simultaneous processing of multiple instructions, data forwarding or bypassing, and speculative execution based on predictions. Branch predictions and memory dependence predictions may be used in microprocessors.
Although data cache hits have smaller latencies than data cache misses, the hits may still consume multiple cycles to complete. The total number of processor cycles required to retrieve data from memory has been growing rapidly as processor frequencies have increased faster than system memory access times. However, in some cases, the requested data for a read operation, or a load instruction, may still be on the die of the processor core.
A store buffer may be used to buffer outstanding store instructions. The buffered store instructions may or may not have yet received the store data to send to a data cache in the memory hierarchy subsystem. One example of a data dependency between instructions is a read-after-write (RAW) hazard between a load instruction and an older store instruction. The load instruction, or a read operation, attempts to read a memory location that has been modified by an older (in program order) store instruction. When the load instruction is processed, the older store instruction may not have yet committed its results to the memory location.
In some cases, the load instruction may obtain its requested data from the memory subsystem, such as a data cache. In other cases, the load instruction may obtain the requested data from the store buffer. Typically, each of the sources, such as the data cache and the store buffer, are accessed in parallel by each load instruction although only one source has the correct requested data. A first type of load instruction may access a memory location with a relatively high spatial and/or temporal locality. The relatively high locality may correspond to both read and write accesses. Therefore, the first type of load instruction may have a high number of RAW data dependencies. Data forwarding may occur more frequently for this first type of load instruction. However, a second type of load instruction may not have high locality and accordingly, it is not a good candidate for data forwarding. At least two different types of load instructions with different locality include stack accesses and non-stack accesses. Each of the load instructions may be processed in a similar manner and access each of the possible sources for data although the load instructions correspond to different types.
In view of the above, efficient methods and mechanisms for efficient predicting and processing of memory access dependencies are desired.