In a multiprocessor system utilizing shared memory, cache protocols are required to insure data consistency, however, to the overall system, the presence of caches should be transparent. Various protocols are implemented within such systems, including write-through actions (stores are written to the cache and to storage at the same time) and write-back actions (stores are written to the cache and only written to storage when the line in the cache needs to be displaced for a more recent requester or when another processor requires the cached line).
In a write-back protocol, writes may be made to the cache exclusively. These lines must then be marked as changed or modified so that the cache control logic will know whether to write the line back to system memory or to discard it. The modified state is marked by an M bit which improves performance since unmodified lines need not be written back to system memory. A well-known protocol (MESI) is often utilized where there is an M bit as discussed previously, an Exclusive (E) bit indicating that the line only exists in this cache, a Shared (S) bit indicating the line can be shared by multiple users at one time, and an Invalid (I) bit indicating the line is not available in the cache.
In certain systems, two or more serial caches may be utilized. For example, there may be a Level 1 (L1) cache associated with each processor (implemented as a write-through cache). The L1 cache is small so as to fit on the same chip as the processor to optimize access time performance. Off of the processor chip may be a larger and slower Level 2 (L2) cache, which is faster than the main system memory storage. This L2 cache is implemented as a write-back cache, and may contain a super-set of the data in the L1 cache. The L2 cache is generally much larger than the L1 cache; each cache holds the set of lines that have been most recently accessed by the associated processor, the set for each cache being proportional to its size.
The L2 and L1 caches utilize the MESI protocol such that for each cache line, there is an M, E, S, or I state that indicates the current state of the cache line in the system. The L1 cache does not require the Exclusive bit in systems where it is the L2 cache's responsibility to manage line MESI state changes. In these systems, a line marked Exclusive in L2, would be marked Shared in L1. If another processor wants to share a copy of this line, the L2 would indicate via its snoop response that the line is Shared, and change the state of the L2 copy of the line to Shared. Since the L1 line state did not need to be changed, the L1 cache did not need to be involved in the line state change, improving performance.
In systems employing "snoopy" bus devices, each of the bus devices coupled to the shared bus will monitor an operation, such as a read operation, to determine whether or not a more recent (often modified) copy of the requested data is contained within that bus device's cache, thus making that modified copy of the requested data the most recent version. This is often referred to as coherency checking, wherein the system insures that the most recent and valid version of requested data is sent to the requesting device, regardless of whether or not the memory system or one of the bus devices currently holds a copy of the most recent version.
In a multiprocessor environment, snoop latency may be fixed, which means that when a processor makes a storage request on the system bus, all other processors, or bus devices, must respond within a fixed period of time. In the event the storage request is a line read, other processors or devices which have a copy of the line are allowed to respond only with Shared or Modified. (A processor is not allowed to keep exclusive ownership of the line in this case.) If the snoop response is Modified, the processor owning the current state of the line must provide the current copy to the requestor, and change the state of its copy of the line to Shared or Invalid, depending on the "snoopy" bus protocol. In systems where the L1 cache cannot be snooped, or the L1 cache snoop response cannot meet the fixed response time requirement of the "snoopy" bus, the L2 cache must mark a line as Modified prior to any processor store to that line.
The problem with this approach is that there is latency introduced in the case where a processor decides to do a store to a cache line that is exclusive in its L2 cache. Normally, the processor would be required to request that the L2 cache initiate a state change to Modified for the cache line prior to storing the data. This may require several cycles since the MESI state for each line is in a relatively slow cache tag array. In order to change the MESI state, the L2 cache has to be read to determine its state and then updated to reflect the Modified state. In addition, the operation is generally pipelined, which adds to the latency. As a result of the foregoing, there is a need in the art for an improvement in the performance of cache coherency within a multiprocessor system.