1. Field
The present invention relates to the design of processors within computer systems. More specifically, the present invention relates to a technique for avoiding deadlock while attempting to acquire store-marks on cache lines, wherein a store-mark on a cache line indicates that one or more store buffer entries associated with the cache line are waiting to be committed.
2. Related Art
Advances in semiconductor fabrication technology have given rise to dramatic increases in microprocessor clock speeds. This increase in microprocessor clock speeds has not been matched by a corresponding increase in memory access speeds. Hence, the disparity between microprocessor clock speeds and memory access speeds continues to grow, and is beginning to create significant performance problems. Execution profiles for fast microprocessor systems show that a large fraction of execution time is spent not within the microprocessor core, but within memory structures outside of the microprocessor core. This means that the microprocessor systems spend a large fraction of time waiting for memory references to complete instead of performing computational operations.
Efficient caching schemes can help reduce the number of memory accesses that are performed. However, when a memory reference, such as a load, generates a cache miss, the subsequent access to level-two (L2) cache or memory can require dozens or hundreds of clock cycles to complete, during which time the processor is typically idle, performing no useful work.
In contrast, cache misses during stores typically do not affect processor performance as much because the processor usually places the stores into a “store queue” and continues executing subsequent instructions. Existing store queue designs typically maintain an array of pending stores in program order. Note that some of these pending stores can possibly be directed to the same word in the same cache line. In particular, if consecutive stores are directed to the same word, these stores can be effectively merged into a single entry in the store queue without violating a conventional memory model, such as the Total-Store-Order (TSO) memory model. This merging can effectively reduce the memory bandwidth because the number of memory accesses is reduced.
However, when “non-consecutive” stores (that is, stores that are separated, in program order, by one or more stores by the same thread to a different word) directed to a same word are pending in a store queue, these non-consecutive stores to the same word typically cannot be merged without violating a conventional memory model, such as TSO. TSO is violated because merging non-consecutive stores effectively reorders the stores with respect to other intervening memory accesses.
This problem can be mitigated by “store-marking” cache lines to indicate that one or more store buffer entries are waiting to be committed to the cache lines, and then delaying accesses to the store-marked cache lines by other threads. In this way, stores to a given cache line can be reordered, thereby allowing non-consecutive stores to be merged without violating TSO.
For efficiency reasons, it is desirable to process store-mark requests in a pipelined manner, to allow a given store-mark request to be initiated before preceding store-mark requests for the same thread complete. Unfortunately, this can give rise to a deadlock condition when one or more other threads are attempting to store-mark the same cache lines. For example, assume that a first thread attempts to store-mark a cache line A and then attempts to store-mark a cache line B. At the same time, assume a second thread attempts to store-mark cache line B and then attempts to store-mark cache line A. If the first thread successfully store-marks cache line B and the second thread successfully store-marks cache line A, a deadlock condition can arise. In particular, a deadlock will arise if the first thread is waiting for the second thread to release the store-mark on cache line A before the first thread will release the store-mark on cache line B, and if the second thread is waiting for the first thread to release the store-mark on the cache line B before the second thread will release the store-mark on cache line A.
Hence, what is needed is a method and an apparatus for avoiding deadlock while store-marking cache lines.