1. Field of the Invention
The present invention relates to cache coherency control. More particularly, it relates to a write back cache coherency control system and method for computer systems having a bus that does not support write back caches.
A cache memory is a small very fast memory used to store frequently used instructions and data. Cache memory is a tradeoff between speed and cost. Ideally, a computer would have main memory that is as fast as the CPU. But while such memory exists, it is much more expensive than main memory. Fortunately, a phenomenon known as the locality of execution principle exists which makes caching possible. This principle states that the CPU tends to use the same memory locations regularly. In general, the CPU follows the 80/20 rule which states that 20% of the memory addresses will be used 80% of the time. This being the case, a small, very fast and expensive memory can be used in conjunction with a much larger, slower and inexpensive main memory to gain most of the performance of an all high speed main memory, but at a small fraction of the cost.
A typical cache memory consists of very fast static RAM and a very fast cache controller. In operation, when the CPU requests data from main memory, the cache controller checks to see if the data is in the cache. This check is done with very fast logic. Ideally, the speed of the cache is tuned to the speed of the CPU, and with on board caches this happens automatically. If the requested data is in the cache, it is delivered much more quickly than if it were in main memory. This situation is called a cache hit. If the requested data is not in the cache, the CPU must access the slower main memory for the data. This is called a cache miss. Data retrieved during a cache miss is also written to cache for future use. In order to write new data into cache, other data must be eliminated. This is called a line replacement in cache. The cache controller performs this function and attempts to maintain the most frequently used data in cache.
However there are architectural problems with a cache memory. A first problem occurs when the CPU has modified data in cache memory and an external device needs to read that data. The CPU has copied the data from some locations in main memory into its cache memory. The CPU may then be using and modifying the data in cache. There are then certain locations in main memory with data which may be different than the data in the corresponding locations in cache. This doesn't matter for the CPU since it always looks first in the cache and thus has the most current data. However, if an external device (one other than the CPU) needs the same data, it would read what is called "stale or dirty" data.
A second problem occurs when the CPU has modified data in the cache and an external device writes new data into the corresponding locations in main memory. In this case, the data in the cache becomes stale.
These two situations give rise to the general problem known as cache coherency. That is, how does the system insure that the CPU and all external devices are using the same data when a cache memory is employed.
The solution is of course simple when the CPU directly supervises all accesses to main memory. But, modern, high performance architectures allow external devices to access main memory directly.
The solution for the external device read problem is related to the write back strategy. That is, whenever the CPU modifies the data in its cache, the issue is when does the CPU cause the modified data to be put into main memory. There are two approaches known as write through or write back.
In a write through cache, whenever the CPU modifies any data in cache, it immediately writes the new information to the corresponding locations in main memory. This approach assures the system that any device which accesses main memory will have access to exactly the same data as in the cache.
In a write back cache scheme, the CPU does not modify the main memory in every memory update cycle. Rather, the CPU keeps operating out of cache for so long as an external device does not seek access to the memory locations that have been cached or memory locations in cache that need to be replaced have not been modified.
In order to implement cache coherency, system logic external to the CPU is provided. The logic typically resides in a chip set on the mother board. The system logic monitors the address and control busses looking for requests from external devices seeking access to addresses in main memory that have been cached. This is known as a snoop or an inquire cycle. The inquire occurs regardless of whether the system is write through or write back. In the case of an external device seeking to write to a main memory address that has been cached by the CPU, an action known as "invalidate" is invoked. The system logic detects the attempt by the external device to write to main memory and runs an inquire/invalidation cycle to the CPU. In a write through cache, this results in the CPU invalidating the cache line that contains data that is being modified by the external device.
In the case of a write back cache, if the inquiry cycle results in a hit to data in the cache that is modified, then a back off command is issued to the inquiring external device. When the external device receives a backoff command, it stops the operation it was doing and gives control of the bus back to the CPU. The CPU then writes the modified data from the cache to the corresponding locations in main memory, then invalidates the line in cache containing the modified data and then gives control of the bus back to the external device. The external device will then re-execute the cycle that was interrupted by the back off command.
Thus in a conventional implementation of a cache memory, the bus is designed to support one of the two coherency schemes: write through or write back. In a system supporting a write through cache, the bus is designed to support cache invalidation. In a system supporting a write back cache, the bus needs to not only have the signals to support cache invalidation, but also have signals to supply inquiry and write back.
However, there is a very large installed base of computers that either do not support a write back cache or support no cache at all, and some significant number of the users of such computers could have an interest in upgrading their computer by retrofiting them with a microprocessor containing a cache memory. This would be a low cost approach to improved performance. But, in order to do this, it is necessary to develop a way of maintaining cache coherency on a computer bus that does not have the conventional hardware to support a cache memory.
Of the applications referenced in the Cross-reference to Related Applications section of this application the Ghori and second Kulkarni application describe two ways to implement a write through cache on a computer system that supports no cache. The second Kulkarni application deals with both.
The Ghori describes a cache coherency module ("CCM") added to the microprocessor chip that snoops the addresses put on the bus. When the CCM detects that the DMA is programmed to do a write to main memory, the entire cache is flushed and the page in main memory (in MS DOS systems, main memory is divided into pages of 64K bytes) that is being written to is kept non-cacheable until the data write is complete. Completion of the data write is determined by monitoring the application software's interrogation of the DMA status register with a software initiated interrupt. The cache is turned off completely when cascade mode is implemented since the CPU does not know which areas of memory may become incoherent.
The second Kulkarni application deals exclusively with write through caches on busses that do not support write through cache. The method of the second Kulkarni application consists of detecting the period of time during which the DMA controller is programmed to allow an external device to write to main memory. The cache memory is flushed at the end of the period of time during which the DMA controller is programmed to allow an external device to write to main memory. In addition, the method detects all reads to non-standard I/O and memory space made by the CPU and all non-standard interrupts during the time which the DMA controller is programmed to allow an external device to write to main memory and flushes the cache memory after each such detection.
The apparatus of the second Kulkarni application which is used to implement the method of that application is a cache coherency module which consists of a bus snooping sub-module connected to the bus for monitoring address, control and data signals on the bus; a DMA address table containing the addresses of registers of the DMA controller; a system address table containing at least the addresses of all non-standard I/O and memory and non-standard interrupts; a logic sub-module connected to the cache memory and the bus snooping sub-module and communicating with the DMA address table and the system address table. The logic sub-module uses the information in the various tables to interpret the monitored bus signals and supplies a signal to the cache memory causing the flush thereof (1) upon the indication that a non-standard I/O read has occurred while the DMA is programmed to allow the external I/O device to write to main memory and (2) at the end of the period that the DMA is programmed to allow the external I/O device to write to main memory. The cache coherency module is further adapted to recognize the presence of a bus master device in the computer system and thereafter to always supply a signal to the cache memory causing it to flush upon the indication of a non-standard I/O read, or a non-standard interrupt regardless of the status of the DMA controller.
The first Kulkarni application cross-referenced above deals with write back cache coherency for systems having busses that either support write through caching only or no caching at all on the bus; however this application does not deal with all possibilities that can create incoherency and deals with others in a less than optimum way.
In the first Kulkarni application, the cache is flushed rather than synchronized. To flush means to invalidate the entire contents of the cache. In the case where one has a cache on a bus that does not support cache at all, one must flush the cache. But in cases where the objective is to support a write back cache in a write through system, the high performance objective is to synchronize the cache and main memory. The term synchronized and coherent mean the same. The first Kulkarni solution would work, but it would be slower.
The present invention also differs from the first Kulkarni in the following ways. Kulkarni uses a read to non-standard address or a non-standard interrupt to indicate the end of a data transfer. With a write back cache, there is no need to be concerned with the end of a data transfer since the cache and main memory were synchronized immediately prior to the transfer. In addition, with a write back cache system, it is necessary to synchronize on every non-standard interrupt. This is necessary because the CPU could logically set up a transfer by a write to either non-standard I/O space or non-standard memory space, but with a protocol that allowed for an indication of completion of the transfer by a non-standard interrupt. Under this circumstance, if the write back cache memory and main memory are not synchronized upon detection of the non-standard interrupt, incoherency could occur. The incoherency would be as follows:
The CPU and a busmaster card communicate via a protocol that sets up locations on memory that are referred to as mailboxes. The CPU leaves instructions for the busmaster device in the mailboxes. The protocol also gives the busmaster device the ability to tell the CPU that it has finished a task in a mailbox by way of a non-standard interrupt.
The incoherency can occur when there are more than one mailbox which is possible under the protocol. Suppose that the CPU has filled two mailboxes with instructions--mailbox 1 has instructions for memory area 1 and mailbox 2 has instructions for memory area 2. After completion of the instructions in mailbox 1, the busmaster sends a non-standard interrupt to the CPU telling it that it is finished with the task defined in mailbox 1. The busmaster device then goes on to perform the tasks set out in mailbox 2. In the meantime, the CPU can work with the data in memory area 1 since it has received the indication that the busmaster has completed its assigned job. Also, the CPU may set up new and different instructions in mailbox 1. When the busmaster completes the instructions in mailbox 2, it again sends a non-standard interrupt to the CPU indicating that has completed its task. The busmaster then goes back to mailbox 1 looking for additional tasks. The new commands in mailbox 1 and/or the data in memory area 1 would not be the appropriate command data unless a synchronization of cache occurs on completion of commands in mailbox 2. Such a case could occur when the CPU does not indicate that new commands have been posted in mailbox 1 or new data is available in memory area 1 by a write to a non standard I/O or memory location.