1. Field of the Invention
The present invention relates to the design of computer systems. More specifically, the present invention relates to a technique for preventing store starvation in a computer system that supports marked coherence.
2. Related Art
Advances in semiconductor fabrication technology have given rise to dramatic increases in microprocessor clock speeds. This increase in microprocessor clock speeds has not been matched by a corresponding increase in memory access speeds. Hence, the disparity between microprocessor clock speeds and memory access speeds continues to grow, and is beginning to create significant performance problems. Execution profiles for fast microprocessor systems show that a large fraction of execution time is spent not within the microprocessor core, but within memory structures outside of the microprocessor core. This means that the microprocessor systems spend a large fraction of time waiting for memory references to complete instead of performing computational operations.
Efficient caching schemes can help to reduce the number of accesses to memory. However, when a memory operation, such as a load, generates a cache miss, the subsequent access to level-two (L2) cache or memory can require dozens or hundreds of clock cycles to complete, during which time the processor is typically idle, performing no useful work.
One way to mitigate this problem is to speculatively execute subsequent instructions (including loads) during cache misses. Specifically, the processor does not wait for loads that generate cache misses to complete, but instead speculatively performs subsequent loads. Consequently, a large number of loads can be speculatively performed out of program order. Eventually, the processor completes the earlier loads, and if the speculative execution is successful, commits the speculative loads to the architectural state of the processor.
Some existing speculative-execution techniques use dedicated hardware structures which maintain the addresses of speculative loads while snooping invalidations to detect if any of the speculatively-loaded cache lines is invalidated. These existing techniques will “fail” a speculative load if such an invalidation is detected. Unfortunately, these existing techniques require dedicated hardware resources that do not scale well for a large number of speculative loads.
Another technique uses metadata in the L1 data cache to indicate if a thread has speculatively loaded the cache line. (See U.S. Pat. No. 7,089,374, entitled, “Selectively Unmarking Load-Marked Cache Lines during Transactional Program Execution,” by inventors Marc Tremblay and Shailender Chaudhry.) This technique “fails” a speculative load if the corresponding speculatively-loaded cache line is invalidated or replaced from the L1 data cache. Unfortunately, because such invalidations and replacements occur more frequently than common coherence conflicts, they cause a significant number of failed speculative loads. These failed speculative loads consume a disproportionate amount of memory bandwidth and reduce the performance of non-speculative loads.
Some processor designers have suggested allowing threads to place “load marks” on cache lines that have been speculatively loaded. While the cache line is load-marked, no other thread is permitted to store to the cache line. However, other threads are allowed to continue loading from and load-marking the cache line. Hence, multiple threads can read from the cache line and each of these threads can prevent other threads from storing to the cache line. For more details on load-marking cache lines, see “Facilitating Load Reordering through Cache Line Marking” by the same inventors as the instant application, having Ser. No. 11/591,225, and filing date TO BE ASSIGNED.
Unfortunately, a problem called “store starvation” can arise when threads place load-marks on a cache line and the threads never simultaneously clear all of their load-marks from the cache line. Because the presence of any load-mark on a cache line prevents other threads from storing to the cache line, no thread will ever be able to store to the cache line. In this situation, the threads that attempt to store will never be able to store and will consequently suffer store starvation.
Hence, what is needed is a method and apparatus that facilitates marking cache lines without the above-described problem.