1. Technical Field
The present invention relates generally to data processing systems and in particular to a method and system for providing cache coherency for speculatively-issued fill cache line writes in a data processing system. Still more particularly, the present invention relates to a cache coherency protocol that enables DMA Claim response without cache-to-cache transfer of data when the requesting device is going to overwrite the entire cache line.
2. Description of the Related Art
A conventional multiprocessor data processing system (referred to hereinafter as an MP), typically comprises a system memory, input/output (I/O) devices, a plurality of processing elements that each include a processor, and one or more levels of cache memory. The combination of the caches and system memory provide a memory hierarchy that is typically consistent.
The caches are commonly used to temporarily store values that might be repeatedly accessed by a processor or other device (e.g., I/O), in order to speed up processing by avoiding the longer step of loading the values from memory. Each cache has an associated cache controller that manages the transfer of data and instructions between the processor core and the cache memory and coordinates coherency operations for that cache.
In addition to processor caches, other types of caches are often implemented to provide temporary storage to a device that frequently accesses data that is stored in or retrieved from memory. For example, an I/O cache may be utilized to stage data transmissions to and from the I/O devices. The I/O cache enables buffering of data being transmitted to the I/O device or being sent to memory from the I/O device.
With multiple caches within the memory hierarchy, a coherent structure is required for valid execution results in the MP. This coherent structure provides a single view of the contents of memory to all of the processors and other memory access devices, e.g., I/O devices. Coherent memory hierarchy is maintained through the use of a coherency protocol, such as the MESI protocol. FIG. 4 illustrates the possible state transitions when supporting cache coherency operations with the MESI protocol. As illustrated, with the MESI protocol, a cache line may be tagged with one of four states, “M” (Modified), “E” (Exclusive), “S” (Shared) or “I” (Invalid).
In the MESI protocol, an indication of a coherency state is stored in association with each coherency granule (e.g., cache line or sector) of at least all upper level (cache) memories. Each coherency granule can have one of the four MESI states, which is indicated by bits in the cache directory's SRAM. The modified state indicates that a coherency granule is valid only in the cache storing the modified coherency granule and that the value of the modified coherency granule has not been written to system memory. When a coherency granule is indicated as exclusive, the coherency granule is resident in, of all caches at that level of the memory hierarchy, only the cache having the coherency granule in the exclusive state. The data in the exclusive state is consistent with system memory, however. If a coherency granule is marked as shared in a cache directory, the coherency granule is resident in the associated cache and in at least one other cache at the same level of the memory hierarchy, all of the copies of the coherency granule being consistent with system memory. Finally, the invalid state indicates that the data and address tag associated with a coherency granule are both invalid.
The state to which each coherency granule (e.g., cache line) is set is dependent upon both a previous state of the cache line and the type of memory access sought by a requesting processor. Accordingly, maintaining memory coherency in the multiprocessor data processing system requires that the processors communicate messages across the system bus indicating their intention to read or write memory locations. For example, when a processor desires to write data to a memory location, the processor must first inform all other processing elements of its intention to write data to the memory location and receive permission from all other processing elements to carry out the write operation. The permission messages received by the requesting processor indicate that all other cached copies of the contents of the memory location have been invalidated, thereby guaranteeing that the other processors will not access stale local data.
Typical processor operations that affect the coherency state of the cache lines include reads, stores, DClaims, caste out (CO), read-with-intent-to-modify (RWITM), and data cache block set to zero (DCBZ) operations. I/O devices may also affect the coherency state of the cache lines, and these operations include direct memory access (DMA) reads, DMA writes, DMA Claim, CO, and RWITM. Most of the processor operations (except the DCBZ) require access to only a portion of a cache line at a time. For example, with a 128B cache line, a 64B store operation may be completed by the processor and affects only the first or second 64B of data in the cache line. However, the I/O device operations all require access to the entire cache line. A DMA Write, for example, requires access to the entire 128B cache line and overwrites the entire 128B cache line. DCBZ is one processor operation that also requires access to the full cache line and overwrites the entire cache line.
DMA writes require the writing device be given ownership of the cache line so that no other device can access the line until the write is completed. To provide the writing device with sole (i.e., exclusive ownership given to that device) ownership of the cache line, a first operation, the DMA Claim operation is issued on the system bus prior to the DMA Write being issued. The DMA Claim is an address operation that reserves a particular cache line for receiving the data of the DMA write. When the DMA Claim is snooped by the other devices the most coherent copy of the data within the caches is immediately placed on the data bus and sent to the cache that now has sole ownership of the cache line (i.e., the cache of the device completing the DMA Claim).
Similar to the DMA write by the I/O device, a DCBZ operation may be issued by a processor that intends to overwrite the content of an entire cache line. The processor is provided sole ownership of the cache line via a DClaim operation, which also forces the most coherent copy of the data to be sent to the processor cache. The DClaim operation is thus similar in functionality to the DMA Claim operation.
Because of the latency involved in providing data on the data bus following a DMA Claim and/or a DClaim operation, current systems typically send these operations out on the address bus ahead of time to reserve the cache line and trigger the movement of most coherent data to the device's cache from another cache, if required. However, the data sent to the device cache is typically not needed since the DMA Writes and DCBZ operations overwrite the content of the cache line. Nonetheless, with the MESI protocol, maintaining coherency requires this sequence of address operation followed by data operation to be followed. While the data is being transferred, no other device, is allowed access to the cache line and the device writing to the line has to wait until the data arrives before it can complete the write operation. Thus, significant latency is built into this process. Additionally, placing the data on the data bus for cache-to-cache transfer utilizes a substantial amount of bus resources that could be allocated to other processes.
The present invention recognizes that it would be desirable to provide a method and system by which coherency latency for speculative cache line writes is hidden or substantially reduced. A cache coherency protocol that includes a coherency state to account for speculative, full cache line writes to a cache would be a welcomed improvement. These and other features are provided by the invention described herein.