1. Field of the Invention
The present invention relates generally to cache memories in a multiprocessor data processing system and more particularly to the use of temporary access states to indicate how data may be accessed via store-in cache memories.
2. Description of the Prior Art
Modern high performance stored program digital computers conventionally access instructions and data that reside in a main storage by using a cache memory. A cache memory is a memory located in close proximity to a processor which is typically much smaller and much faster than the main storage of the computer. Virtually all high performance digital computers use cache memories and even some commercially available microprocessors include cache memories in their architectures.
Cache memories have been developed because it has not been possible to build extremely large memories at a reasonable cost that have a data access time commensurate with modern-day pipelined processors. It is, however, possible to build inexpensive, small memories that can keep up with the processors.
The design of a cache memory takes advantage of two properties which have been observed in data processing systems. The first property is known as temporal locality of reference. This property refers to the tendency of a processor to access an instruction or data value repeatedly during a relatively small time interval. The second property is known as spatial locality of reference. This property refers to the tendency of a processor, during a small time interval, to access data or instructions which have addresses in the main storage that differ by a relatively small value. These first and second properties form the rationale for keeping, in a cache memory, memory words containing both the most recently accessed data and instructions and respective lines of memory which are immediately adjacent to these most recently accessed words in the address space of the processor.
Cache memories may be used in both single processor and multiprocessor systems. In a type of multiprocessor system known as a tightly-coupled system, several central processors, which each have their own cache memories, share a common operating system and a common main storage. In a tightly-coupled system, it is desirable for each cache memory to operate in concert with all of the other cache memories. In particular, each processor should be able to obtain the most recently updated version of a data value regardless of whether it is located in main storage or in the cache memory of another processor. Consequently, it is desirable to constantly monitor data consistencies among the caches. This monitoring operation is known as cache coherence control.
There are various types of cache memories in the prior art. One type is the store-through (ST) cache memory, in which a data store instruction has three parts: a store into the cache memory associated with the processor initiating the instruction, a store into the main storage, and a sequence of cross-interrogate (XI) actions which invalidate any local copies of the data in other cache memories on the system. Usually, ST cache memory designs require substantial main storage bandwidth to operate efficiently.
Another type of design is the store-in (SI) cache. An SI cache design and its use by a multiprocessor system is described in U.S. Pat. No. 4,503,497 to Krygowski et al., which is hereby incorporated by reference. In a SI cache, data values are transferred among the various cache memories via a cache-to-cache (CTC) transfer bus. This type of cache also uses XI actions to coordinate the contents of the various cache memories. A central storage controller, which controls all memory access, contains a directory of the contents of the various cache memories. In the system described in the referenced patent, each line of adjacent memory words held by a cache memory has its access controlled by an exclusive/read-only (EX/RO) flag bit. Usually, data values are updated in the main storage only when lines of memory words are replaced from a cache memory and the data in the lines has been modified.
Lines of memory words may be replaced from a cache and written into the main storage in accordance with, for example, a least-recently-used (LRU) scheme. The transfer of a line of memory words from a cache to the main storage is known as a castout operation. Since data is written into the main storage only when it is castout of a cache, the storage bandwidth requirements of a multiprocessor system using SI cache memories are less than that of a comparable system using ST cache memories. However, this drop in bandwidth requirements is obtained at the expense of a more complex cache coherence control system and the penalties to the execution speed of the processors caused by the cache-to-cache copy operations and the concomitant XI events.
In a typical multiprocessor system using SI cache memories, each cache includes a cache directory which contains information as to whether the access to a line of memory words is read only (RO), exclusive (EX) or invalid (INV). In addition, if the line of memory words has an EX status, the directory contains information (a bit CH) indicating whether any data in the line has been changed.
If the status of a line is RO the data in that line may only be read. Generally, a line of instruction words has this status. An RO cache line may exist simultaneously in several different cache memories.
If a line of memory words has a status of EX, that line may appear in the cache of only one processor. The processor which has the line in its cache is the only processor that is allowed to fetch data from or store data into the line of memory words. If, in addition, the directory entry, CH, for the line indicates that data in the line has been changed, then the data in the cache line does not match the corresponding data in main storage. When a line having its CH bit set is replaced in the cache, a copy is sent to the main storage via the castout action.
A line of memory words has the INV status when it is invalid. A status of EX or RO for a line in one cache memory is changed to INV when the line is obtained with an EX status by another cache memory.
A typical multiprocessor system using SI cache memories (e.g., as in U.S. Pat. No. 4,503,497 referenced above) operates as follows. Initially, a processor requests a data word via, for example, a data fetch instruction. Responsive to this request, the processor checks its own cache for a line containing the word. If the cache contains the line, the data word is passed to the processor in the normal course of instruction execution. If, however, the processor fails to find the line in its own cache, the request is passed on to the storage controller. The storage controller checks the request against a directory which indicates the contents of all of the caches. If the target line is found in a remote cache and its status is RO, it is copied over to the requesting cache, if its status is EX and its CH bit is reset, the line is copied to the requesting cache with a status of RO and the status of the line in the remote cache is changed to RO. If the CH bit is set, the line is copied to the requesting cache with the EX status and with its CH bit set; the copy of the line in the remote cache is invalidated. By keeping the EX status and the CH bit set for the line, the system anticipates that the processor will store data into the line in the near future. Finally, if the target line of data is not found in any cache, it is accessed from the main storage and assigned a status of RO.
The approach outlined above may result in unnecessary loss of concurrency in a multiprocessor cache memory system. This may occur, for example, when data in a line is frequently accessed and only infrequently modified. Once a line is modified by a processor, it will have an EX status and will have its CH bit set. This state will continue until the line is replaced from a cache through the LRU replacement algorithm. In this example, each of the frequent fetches for the line may cause an XI event to invalidate the copy of the line in the remote cache.
This anomaly occurs because once data in a line having a status of EX, is changed, its CH bit remains set until the line is castout to main storage. As caches grow larger and the number of processors used in a multiprocessor system increases, this problem becomes even greater. With larger caches, lines do not age out as quickly and, as the number of processors increases, there is a greater tendency for a line to be passed around among the processors via XI events without getting a chance to age out.
U.S. Pat. No. 4,484,267 to Fletcher relates to a hybrid cache architecture in which data in a line may be treated as in a store-in cache in some situations and as in a store-through cache in other situations. In the system described in this patent, the EX/RO bit in the cache directory is replaced by a shared (SH) bit. When a line of data is first accessed, the SH bit for the line is set to zero. This causes the line to be treated as if it were in an SI cache and had the EX status. If another processor accesses the data in that line, the storage control element sets the SH bit to 1 allowing the cache coupled to the other processor to copy the line via the CTC bus. If either processor then requests a store operation for data in the line, the new data is stored in the line in the requesting cache and in main storage as in an ST cache. At the same time, an XI event invalidates all copies of the line in other caches. The SH bit in the requesting cache remains set, however, so that if another cache attempts to access the data, the request may be satisfied by copying the line over the CTC bus.
U.S. Pat. No. 4,394,731 to Flusche et al. relates to a multiprocessor SI cache memory system. In this system, a line having an EX status may be obtained in two instances, first, when there is no copy of the line in any of the remote caches and second, when the line exists in a remote cache and has a status of EX status and has its CH bit set. This cache system does not include a cache-to-cache bus, so, in the second instance described above, the CH line is castout to main storage and then it is assigned to the requesting cache with EX status. If the line of data in the remote cache has a status of EX and a reset CH bit, the status in the remote cache is downgraded to RO and the line is copied from main storage to the requesting cache with an RO status.
U.S. Pat. No. 4,503,497 to Krygowski et al. relates to a multiprocessor SI cache memory system that includes a cache-to-cache transfer bus. This system transfers a target line, that has an EX status and a set CH bit, from a remote cache to the requesting cache without changing the status of the line or the state of its CH bit and without accessing main storage. The target line is invalidated in the remote cache.