Today's microprocessor clock frequencies are at multiple GHz, with execution units capable of executing several instructions per clock cycle. By contrast, memory access times have remained stubbornly static as the execution rate of processors has increased. A processor may be able to execute around a thousand instructions in the time taken to perform a single access to main memory.
Despite the use of caches to hide much of this latency for many memory accesses, a significant proportion of accesses miss in the caches and have to access main memory directly. If a processor stalls on such accesses, considerable performance can be lost. Another approach is to allow the processor to speculatively execute past such long latency instructions, thereby executing instructions out of order. Considerable bookkeeping is required to ensure that the results of the execution are consistent with executing instructions, including memory-accessing instructions, in program order.
The move towards chip multiprocessing (CMP) processors requires the integration of multiprocessor cache coherence on the same chip as the execution pipelines. The interaction between the execution pipelines and the memory system can be quite complex, especially when the pipelines execute instructions out of order.