In multiprocessor systems with shared memory architectures, maintaining memory or cache coherence is a well recognized challenge. For example, multiprocessor systems may have several processors which may each include one or more levels of cache memory, ultimately coupled to a main memory. Because the main memory is shared among the several processors, it is necessary that a consistent view of the contents of the main memory is provided to all the processors.
It is often the case that updates to data (or instructions), residing in one or more levels of cache may not be immediately reflected in all other occurrences of the same data in the memory system. This destroys coherence. The problem is exacerbated in write-back caches. Writing to caches may be performed as a write-through, wherein every write to a cache causes a synchronous write to the backing storage locations of the data in the next levels of cache and main memory; or a write-back, wherein a data write to the cache is updated in the backing storage locations of the data only when the corresponding cache line (or “cache block”) is evicted from the cache. While write-through caches are friendlier to cache coherency, they are also much slower because every cache write suffers from the additional time required to update the backing storage locations. On the other hand, while write-back caches expedite cache writes because only the cache is written during a normal write operation, they may destroy coherency by not immediately updating the newly written data in the backing storage locations.
A commonly used mechanism to maintain coherence particularly in write-back caches involves the so called MESI protocol. The MESI protocol defines the four states: Modified (M). Exclusive (E). Shared (S), and Invalid (I), for every cache line. The Modified state indicates that the cache line is present only in the instant cache, but it is “dirty,” i.e. it has been modified from the value in main memory. The Exclusive state indicates that only the instant cache possesses the cache line, and it is “clean,” i.e. it matches the value in main memory. The Shared state indicates that the cache line is clean, but copies of the cache line may also be present in one or more other caches in the memory system. The Invalid state indicates that the cache line is invalid. Common variations of the MESI protocol, such as the MOESI protocol, may involve additional states such as an Owned (O) state wherein a cache line is indicated to hold the most recent, but dirty and shared copy of the data.
Coherency is maintained by communication between the various processing elements related to desired memory accesses, and managing permissions for updates to caches and main memory based on the state (M/O/E/S/I) of the cache lines. For example, if a processor in the multiprocessor system desires to write data to a cache line of a level 1 (L1) cache associated with it, then if the cache line is in exclusive (E) state, the processor may write the cache line and update it to a Modified (M) state. On the other hand, if the cache line is in a Shared (S) state, then all other copies of the cache line must be invalidated first before the processor may be permitted to write the cache line. Particular implementations of coherency protocols such as MESI/MOESI are well known in the art and will not be further described herein.
Accordingly, in conventional implementations of coherency protocols such as MESI/MOESI, a write to a cache line may be stalled until write permissions have been obtained. In order to obtain permissions, for example, for a cache line residing in an L1 cache, conventional implementations may require traversing one or more levels down the memory hierarchy to a point of coherence in order to obtain permissions, if the state of the cache line indicates that permissions cannot be obtained locally. In other words, if the state of the cache line in the L1 state dictates that write permissions are not current, then backing storage locations in higher levels of memory hierarchy, such as a level 2 (L2) cache or main memory may need to be queried to determine where the point of coherence for the cache line is (again, based on the state of the cache line in these backing storage locations). This process of obtaining write permissions may incur severe penalties in terms of latency and power.
Some write-back cache architectures may be designed according to a no-write-allocate or write-no-allocate policy. In such architectures, if a miss is encountered for the cache line write in the L1 cache, then the write operation skips allocating the cache line in the L1 cache (i.e. does not fetch the cache line from backing storage locations to the L1 cache) and proceeds to writing the cache line in the backing storage, such as the L2 cache or main memory, where the cache line will be found. However, once again, permissions will need to be obtained at the backing storage location where the cache line is found, thus incurring associated penalties.
Thus, conventional implementations suffer from the aforementioned drawbacks associated with maintaining cache coherency and obtaining permissions for write operations. Accordingly, there is a corresponding need in the art for expediting write operations to caches in multiprocessor systems with shared memory architectures.