1. Field of the Invention
The present invention relates to the design of processors within computer systems. More specifically, the present invention relates to a technique that facilitates reordering loads through cacheline marking.
2. Related Art
Advances in semiconductor fabrication technology have given rise to dramatic increases in microprocessor clock speeds. This increase in microprocessor clock speeds has not been matched by a corresponding increase in memory access speeds. Hence, the disparity between microprocessor clock speeds and memory access speeds continues to grow, and is beginning to create significant performance problems. Execution profiles for fast microprocessor systems show that a large fraction of execution time is spent not within the microprocessor core, but within memory structures outside of the microprocessor core. This means that the microprocessor systems spend a large fraction of time waiting for memory references to complete instead of performing computational operations.
Efficient caching schemes can help to reduce the number of accesses to memory. However, when a memory operation, such as a load, generates a cache miss, the subsequent access to level-two (L2) cache or memory can require dozens or hundreds of clock cycles to complete, during which time the processor is typically idle, performing no useful work.
One way to mitigate this problem is to speculatively execute subsequent instructions (including loads) during cache misses. Specifically, the processor does not wait for loads that generate cache misses to complete, but instead speculatively performs subsequent loads. Consequently, a large number of loads can be speculatively performed out of program order. Eventually, the processor completes the earlier loads, and if the speculative execution is successful, commits the speculative loads to the architectural state of the processor.
Some existing speculative-execution techniques use dedicated hardware structures to maintain the addresses of speculative loads while snooping invalidations to detect if any of the speculatively-loaded cachelines is invalidated. These existing techniques “fail” a speculative load if such an invalidation is detected. Unfortunately, these existing techniques require dedicated hardware resources that do not scale well for a large number of speculative loads.
Another existing technique uses metadata in the L1 data cache to indicate if a thread has speculatively loaded the cacheline. (See U.S. Pat. No. 7,089,374, entitled, “Selectively Unmarking Load-Marked Cache Lines during Transactional Program Execution,” by inventors Marc Tremblay and Shailender Chaudhry.) This technique “fails” a speculative load if the corresponding speculatively-loaded cacheline is invalidated or replaced from the L1 data cache. However, such invalidations and replacements occur more frequently than common coherence conflicts, and consequently cause a significant number of failed speculative loads. Note that failed speculative loads can consume memory bandwidth and can thereby reduce the performance of non-speculative loads.
Hence, what is needed is a method and apparatus that facilitates reordering loads, such as speculative loads, without the above-described performance problems.