1. Field of the Invention
This invention relates to cache-based computer systems and, more particularly, to cache-based computer systems that include cache write-back mechanisms for maintaining the integrity of data during bus snooping transactions.
2. Description of the Relevant Art
Cache memory subsystems are prevalent within modem-day computer systems and are well known. Exemplary cache memory subsystems are described in a host of publications of the known prior art including, for example, U.S. Pat. Nos. 5,091,875 to Rubinfeld and 5,091,846 to Sachs et al.
A cache memory is a high-speed memory unit interposed in the memory hierarchy of a computer system between a slower system memory and a microprocessor to improve effective memory transfer rates and accordingly improve system performance. The name refers to the fact that the small memory unit is essentially hidden and appears transparent to the user, who is aware only of a larger system memory. The cache is usually implemented by semiconductor memory devices having speeds that are compatible with the speed of the processor, while the system memory utilizes a less costly, lower-speed technology. The cache concept anticipates the likely reuse by the microprocessor of selected data in system memory by storing a copy of the selected data in the cache memory.
A cache memory typically includes a plurality of memory sections, wherein each memory section stores a block or a "line" of two or more words. Each line has associated with it an address tag that uniquely identifies which line of system memory it is a copy of. When a read request originates in the processor for a new word, whether it be data or instruction, an address tag comparison is made to determine whether a copy of the requested word resides in a line of the cache memory. If present, the data is used directly from the cache. This event is referred to as a cache read "hit". If not present, a line containing the requested word is retrieved from system memory and stored in the cache memory. The requested word is simultaneously supplied to the processor. This event is referred to as a cache read "miss".
When the processor desires to write data to memory, a similar address tag comparison is made to determine whether the line into which data is to be written resides in the cache memory. If not present, the line is either fetched into the cache memory from system memory or the data is written directly into the system memory. This event is referred to as a cache write "miss". If the line is present, the data is written directly into the cache memory. This event is referred to as a cache write "hit". As will be explained in greater detail below, in many systems a data "dirty bit" for the cache line is then set. The dirty bit indicates that data stored within the line is dirty (i.e., has been modified and is inconsistent with system memory), and thus, before the line is deleted from the cache memory or overwritten, the modified data must be written back to system memory.
Since the cache is of limited size, space must often be allocated within the cache to accommodate a new line of data. An algorithm based on history of use is typically implemented to identify the least necessary line to be overwritten by the new line. A line which is overwritten or copied out of the cache memory when new data is stored in the cache memory is referred to as a victim block or a victim line.
One aspect that affects cache performance and design complexity is the handling of writes initiated by the processor or by an alternate bus master. As explained previously, because two copies of a particular piece of data or instruction code can exist, one in system memory and a duplicate copy in the cache, writes to either the system memory or the cache memory can result in an incoherence between the two storage units. For example, consider the case in which the same data is initially stored at a predetermined address in both the cache memory and the system memory. If the processor subsequently initiates a write cycle to store a new data item at the predetermined address, a cache write "hit" occurs and the processor proceeds to write the new data into the cache memory at the predetermined address. Since the data is modified in the cache memory but not in system memory, the cache memory and system memory become incoherent. Similarly, in systems with an alternate bus master, direct memory access (DMA) write cycles to system memory by the alternate bus master modify data in system memory but not in the cache memory. Again, the cache memory and system memory become incoherent.
An incoherence between the cache memory and system memory during processor writes can be prevented or handled by implementing one of several commonly employed techniques. In a first technique, a "write-through" cache guarantees consistency between the cache memory and system memory by writing the same data to both the cache memory and system memory. The contents of the cache memory and system memory are always identical, and so the two storage systems are always coherent. In a second technique, a "write-back" cache handles processor writes by writing only to the cache memory and setting a "dirty" bit to indicate cache entries which have been altered by the processor. When "dirty" or altered cache entries are later replaced during a "cache replacement" cycle, the modified data is written back into system memory.
An incoherence between the cache memory and system memory during a DMA write operation is handled somewhat differently. Depending upon the particular caching architecture employed, one of a variety of bus monitoring or "snooping" techniques may be used. One such technique involves the invalidation of cache entries which become "stale" or inconsistent with system memory after a DMA write to system memory occurs. Another technique involves the "write-back" to system memory of all dirty data within the cache memory prior to the actual writing of data by the alternate bus master. After the dirty data is written back to system memory, the contents of the entire cache are invalidated and the write by the alternate bus master may be performed. Since only a single copy of valid data remains in the system, the DMA write to system memory does not present the problem of possibly "stale" data in the cache.
FIG. 1 and FIGS. 2A-2C are provided to more clearly illustrate the problems associated with an incoherency between the cache memory and system memory during a DMA write operation as well as to illustrate a typical technique for maintaining the integrity of data when such a situation arises.
Referring first to FIG. 1, a block diagram is shown of a typical computer system 100 including a central processing unit (CPU) 102 coupled via a local CPU bus 104 to a cache memory 106 and a cache controller 108. A bus interface unit 110 provides an interface between a system bus 112 and the cache memory 106 and cache controller 108. A system memory 114 is coupled to the system bus 112 through a memory controller 116, and a disk memory unit 118 is coupled to the system bus 112 through a DMA controller 120.
The DMA controller 120 is an alternate bus master that allows data from disk memory unit 118 to be transferred directly into system memory 114 (via memory controller 116) without the supervisory control or intervention of CPU 102. Frequently it is desirable to transfer a relatively large block of data (i.e., 1 Kbyte or larger) from the disk memory unit 118 to system memory 114 during a single DMA request of the alternate bus master. Accordingly, the DMA controller 120 may initiate one or more burst transfer cycles on system bus 112 to sequentially transfer the desired block of data. As is well known to those of skill in the art, during the data phase of a burst cycle, a new word may be provided to system bus 112 from DMA controller 120 for several successive clock cycles without intervening address phases. The fastest burst cycle (no wait states) requires two clock cycles for the first word (one clock for the address, one clock for the corresponding word), with subsequent words returned from sequential addresses on every subsequent clock cycle. For systems based on the particularly popular model 80486 microprocessor, a total of four "doublewords" may be transferred for a given burst cycle.
FIGS. 2A-2C are block diagrams that illustrate the flow of data within computer system 100 when a DMA write operation occurs. As will be evident from the following description, the particular data transfers that occur are dependent upon the status of data (i.e., clean, dirty, invalid) within cache memory 106.
Referring to FIG. 2A, a DMA write operation is depicted for a situation wherein the DMA controller 120 desires to write a line of data 200 into a memory region 210 of system memory 114. In this example, line 200 consists of four words "A", "B", "C" and "D", and memory region 210 consists of four address locations "W", "X", "Y" and "Z". When the DMA write operation is initiated, an address tag comparison is made to determine whether cache memory 106 contains a line of data corresponding to the memory region 210 to which line 200 is to be written. If cache memory 106 does not contain a corresponding line, a cache miss occurs and the line 200 of data is transferred into system memory 114. This data transfer is accomplished by executing a single burst write cycle on system bus 112 to write the words "A", "B", "C", and "D" into address locations "W", "X", "Y", and "Z", respectively. Since the cache memory 106 in this situation does not contain a line corresponding to memory region 210, a data incoherency does not exist prior to the execution of or after completion of the data transfer. Therefore there is no need to write-back data or change the status of data within cache memory 106.
FIG. 2B illustrates a similar data transfer that is effectuated when a cache "hit" to "clean" data occurs within the cache memory 106 during a DMA write operation. As mentioned previously, a cache "hit" occurs when the cache memory 106 contains an associated line 220 corresponding to the memory region 210 of system memory 114 (into which data is to be written in accordance with the write instruction being executed). The "hit" line 220 is "clean" if it contains data that is identical to the corresponding data stored within memory region 210 of system memory 114. That is, line 220 is clean if its component words "E", "F", "G" and "H" are identical to the words stored within address locations "W", "X", "Y" and "Z", respectively. As illustrated in the figure, when such a DMA write occurs with clean data in the cache the line 200 is written into memory region 210 by executing a single burst write cycle on system bus 112. Similar to the previously described transfer of FIG. 2A, the words "A", "B", "C", and "D" are written into address locations "W", "X", "Y", and "Z", respectively. In this case, however, the line 220 residing within cache memory 106 no longer contains the most up-to-date information (i.e., the words "A", "B", "C", and "D" transferred into memory region 210 have become the new valid data). As a consequence, cache controller 108 invalidates the line 220.
Although the transfer of the line 200 into system memory 114 is accomplished by executing a burst write cycle that is predefined to transfer a total of four words per cycle, certain words within line 200 may be marked as invalid to inhibit them from being written into system memory 114. For example, the DMA controller 120 may require that only the words "A", "B" and "C" be accessed from disk memory unit 118 and may not request the fourth word "D". When the burst write cycle is initiated by DMA controller 120 to transfer the words "A", "B" and "C" into system memory 114, the fourth data transfer of the burst cycle involves indeterminate data that should not be stored within system memory 114. To deal with this situation, the DMA controller 120 marks the indeterminate data as invalid. When the invalid data is received by the memory controller 116, it is inhibited from being written into location "Z" of memory region 210. As a result, words "A", "B" and "C" are written into address locations "W", "X" and "Y", respectively, while the data previously stored at address location "Z" of memory region 210 remains unmodified and is not overwritten. The integrity of data within the system is thereby maintained.
FIG. 2C illustrates the data transfers of a DMA write operation that are effectuated when a cache "hit" occurs with respect to "dirty" data residing within cache memory 106. As mentioned previously, data within the cache is "dirty" if the data has been modified within the cache memory but not in system memory, thus creating a data incoherency. When such a situation arises, the line 200 is transferred into memory region 210 of system memory 114 by executing a burst transfer cycle. This transfer is identical to that described above in conjunction with FIGS. 2A and 2B. As will be explained in greater detail below, after line 200 is transferred into region 210, the line 220 of cache memory 106 is written back into memory region 210 of system memory 114. The write-back to system memory 114 of line 220 is accomplished by executing a second burst transfer cycle. In an alternative embodiment, the line 220 of cache memory 106 may be written back into memory region 210 of the system memory 114 prior to transferring line 200, to accomodate systems which overwrite stale cache lines. Following the write-back to system memory 114, the line 220 within cache memory 106 is marked as invalid.
A similar operation occurs when several words of the line 200 are invalid. For example, if words "A" and "D" are marked invalid, memory controller 116 inhibits them from being written into address locations "W" and "Z". When the write-back of the dirty line 220 within cache memory 106 is executed, memory controller 116 allows words "E" and "H" to be written into address locations "W" and "Z", respectively, and inhibits the writing of words "F" and "G" into address locations "X" and "Y". As a result, words "E", "B", "C" and "H" are stored within address locations "W", "X", "Y" and "Z", respectively, of memory region 160. Data integrity is again maintained.
In the computer system 100, the write-back of the dirty line 220 is executed during a DMA write cycle regardless of whether the transferred line 200 encompasses a complete set of valid data. For a situation in which words "A", "B", "C" and "D" of line 200 are all valid, however, the write-back of the dirty line 220 is actually unnecessary since any dirty data residing within the line 220 is effectively replaced (i.e., is invalidated) as a result of the DMA transfer of line 200. Therefore, although the write-back technique has been generally successful in maintaining the integrity of data where an incoherency exists between the cache memory and system memory prior to a DMA write operation, the bandwidth of the computer system 100 becomes limited since the system bus 112 is occupied with unnecessary bus traffic during the time when the line 220 is written back to system memory 114 (if line 220 contains a complete set of data). As a result, overall system performance may be degraded.