1. Field of the Invention
This invention is directed to digital computers, and more particularly to a multi-processor system following a cache ownership protocol. Specifically, the invention relates to processing writes to a cache and writebacks from the cache to a main memory when the cache encounters certain error conditions under which data in the cache should be ignored unless that data is owned by the cache.
2. Description of the Background Art
Processors in a multi-processor computer system typically communicate via a shared memory. To improve system performance, each processor has a cache memory for temporarily storing copies of data being accessed. Such a hierarchical memory system may follow either a "write through" or a "write back" protocol. In a "write through" protocol, a processor immediately writes data to the shared memory so that any other processor may fetch the most recent memory state from the shared memory. In a "writeback" protocol, a processor writes data to its cache, but this new memory state is written back to the shared memory only when the memory space in the cache needs to be used for different addresses in a cache fill operation, or when another processor needs the new memory state. Therefore the writeback protocol reduces the number of memory access operations to the shared memory when the new memory state is not needed by the other processors. In general, the write through protocol is preferred when different processors frequently access the same shared memory addresses, and the write back protocol is preferred when the different processors infrequently access the same shared memory addresses.
Whenever processors communicate via a shared memory, it is desirable to require the processors to follow a protocol ensuring that a memory address is not written to simultaneously by more than one processor, or else the result of one processor will be nullified by the result of another processor. Such synchronization of memory access is commonly achieved by requiring a processor to obtain an exclusive privilege to write to an addressed portion of the shared memory, before executing a write operation. In a multi-processor system employing writeback caches, such an exclusive privilege gives rise to a cache coherency problem in which data written in the cache of a processor having such an exclusive privilege might be the only valid copy of data for the addressed portion of memory. A cache coherency protocol is required which permits a processor to obtain readily the valid copy of data as well as the privilege to write to it.
One known cache coherency protocol for a multi-processor system employing writeback caches is based on the concept of block ownership; an addressed portion of memory the size of a cache block is either owned by the shared memory or it is owned by one of the writeback caches. Only one of the processors, or the shared memory, may own the block of memory at any given time, and this ownership is indicated by an ownership bit for each block in the shared memory and in each of the caches. A processor may write to a block only when the processor owns the block. Therefore the ownership bits always identify a unique "valid" block data in the system. Shared read-only access to a block is permitted only when the shared memory owns the block. To indicate whether a processor may read a block, each of the caches includes, for each block, a "valid" bit. When a processor desires to read a block that is not valid in its cache, it issues a read transaction to the shared memory, requesting the shared memory to fill its cache with valid data. When a processor desires to write to a block which it does not own, it issues an ownership-read transaction to the shared memory, requesting ownership as well as a fill. From the perspective of the other processors, these transactions are cache coherency transactions, which request any other processor having ownership to give up ownership and writeback the data of the requested block, and in the case of an ownership read transaction, further request the other processors to invalidate any copies of the requested block.
Typically the time for a cache coherency transaction to be transmitted over a system bus is much shorter than the time for fill data to be retrieved from the shared memory. Therefore system performance can be improved by permitting more than one transaction to be pending on the bus at any given time.
As set out in Ser. No. 07/547,699 filed Jun. 29, 1990, entitled BUS PROTOCOL FOR HIGH-PERFORMANCE PROCESSOR, by Rebecca L. Stamm et al., it is desirable to queue transactions from a processor and an associated cache before the transactions are issued onto a bus to a main memory and other processors and caches in a multi-processor system. Outgoing non-writeback transactions are stored in a first queue, and outgoing writeback transactions are stored in a second queue. The separate queuing of writeback and non-writeback requests is used to give priority to writeback transactions during periods of high loading. When a system unit, such as an I/O unit, for example, has received more than a certain number of cache coherency transactions yet to be processed, it suppresses issue of the non-writeback transactions from the first queue but permits the writeback transactions to be issued from the second queue.
As set out in our Ser. No. 07/547,597, filed Jun. 29, 1990, entitled ERROR TRANSITION MODE FOR MULTI-PROCESSOR SYSTEM, by Rebecca L. Stamm et al., issued on Oct. 13, 1992 as U.S. Pat. No. 5,155,843, a write-back cache may encounter certain error conditions for which data in the cache should be ignored unless that data is owned by the cache. In this case, the cache is put into a state called "Error Transition Mode" (ETM). In ETM, the cache is used as little as possible, and the state of the cache is preserved as much as possible for diagnostic software. In ETM, when a processor, makes a memory request for data not owned by the cache, any data in the cache is ignored, and the data is obtained from main memory; and when the processor makes a memory read request for data owned by the cache, the data is obtained from the cache.
A "write ordering bug" may occur if a cache is operated in the ETM mode and writeback and non-writeback requests are separately queued. Suppose, for example, that write data from memory requests to blocks of data owned by the cache were immediately written to the cache, and written back to main memory with the blocks of data owned by the cache upon receipt of cache coherency transactions, in the usual fashion, for both ETM mode and non-ETM mode. Suppose, in ETM mode, a first write misses in the cache and is sent to the non-writeback queue, on its way to main memory, and a second following write to a block in the cache hits owned and is written to the cache. Then a cache coherency request to invalidate the block is received from the system bus, and the block of data (including write data for the second write) is placed in the writeback queue. If the first write has not yet reached the system bus when the block of data is placed in the writeback queue, then the block of data in the writeback queue may pass the first write and be asserted on the system bus before the first write when the writeback queue is given priority over the non-writeback queue. If this should happen, the system sees the second write while it may be reading an old version of the data of the first write, because the first write has not yet reached the main memory. This is a write-ordering problem, because the order of the writes as seen by the system is different from the order of the writes as issued by a processor.