Prior multiple-processor systems have used processor-private store-in L1 caches; and they have maintained the coherence of data in the system by using a set of copy directories, which are copies of all L1 cache directories. Each processor's fetch request is cross-interrogated in the copy directories of all other processors to find if any other processor has a copy of a requested data unit. This process assures that only one processor at a time can have exclusive (EX) ownership for writing in a data unit in the system. Only the one processor that has exclusive ownership of a data unit is allowed to write into the data unit. A data unit can also have public ownership (previously called readonly (RO) authority) which allows all processors to read (fetch) the data unit, but prohibits all processors from writing into the data unit.
The data coherence problem is simpler with a store-through type of cache, which requires all stores made in the L1 cache also be concurrently made in a backing memory. The memory backing the L1 private processor caches may be an L2 shared memory, or it may be the L3 main memory. The shared L2 cache may be store-in or store-through, but preferably is store-in to reduce the store bus traffic to main memory.
The store-in type of cache has been used in computer systems because it requires less bandwidth for its memory bus (between the memory and the cache) than is required by a store-through type of cache for the same frequency of processor accesses. Each cache location may be assigned to a processor request and receive a copy of a data unit fetched from system main memory or from another cache in the system. With a store-in cache, a processor stores into a data unit in a cache location without storing into the correspondingly addressed data unit in main memory, which causes the cache location to become the only location in the system containing the latest changed version of the data unit. The processor may make as many stores (changes) in the data unit as its executing program requires. The integrity of data in the system requires that the latest version of any data unit be used for any subsequent processing of the data unit.
A store-through type of cache is used only for fetching, and maintains the latest version of their accessed data units by having all store accesses change both the processor's store-through cache as well as the same data unit in a memory (another cache or main storage) at the next level in the system storage hierarchy. But the store-through characteristic of such caches do not solve the coherence problem in the system since another processor's store-through cache could contain an older version of the same data unit. Therefore, cross-interrogation of the contents of private processor caches in multiple processor systems is needed whether they are store-in or store-through when a new request is being fetched into a processor cache.
Exclusive ownership (authority to change a cache data unit) is assigned to any processor before it is allowed to perform its first store operation in a data unit. The assignment of processor ownership has been conventionally done by setting an exclusive (EX) flag bit in a cache directory (sometimes called a tag directory) associated with the respective data unit in the cache. The EX flag bit's ON state typically indicates exclusive ownership and the off state of the EX flag bit indicates public ownership (called "read-only authority"). Exclusive ownership by a processor allows only it to store into the data unit, but public (read-only) ownership of a data unit does not allow any processor to store into that data unit and up to all processors in the system to read that data unit (which can result in multiple copies of the non-changeable data unit in different processor caches in the system).
Typically, a cache fetches data units from its storage hierarchy on a demand basis, and a processor cache miss generates a fetch request which is sent to the next level in the storage hierarchy for fetching the data unit.
A store-in cache transmits its changed data units to main memory under control of cache replacement controls, sometimes called the LRU controls. Replacement of the data unit may occur when it has not been recently accessed in the cache, and no other cache entry is available for the new request. This replacement process is sometimes called "aging out" when a least recently used (LRU) entry is chosen to be replaced with a new request. The replacement controls cause the data unit (whether changed or not) in the selected entry to be replaced by another data unit (fetched as a result of a cache miss). When the data unit to be replaced in the cache has been changed, it must be castout of the cache and written into another place such as main memory before it is lost by being overwritten by the newly requested data unit being fetched from main memory. For example, a processor may request a data unit not currently in the cache, which must be fetched from main memory (or from another cache) using the requested address and stored in the newly assigned LRU cache location. The cache assignment of a location for the new data unit will be in a cache location not in current use if one can be found. If all of the useable cache locations are currently occupied with changed data units, then one of them must be reassigned for the new request. But before the new data unit can be written into the cache location, a castout to main memory is required of the updated cache data unit in that location. The castout process must then be used before the new data unit is written into the cache. The castout data unit has its ownership changed from an exclusive processor ownership to a main memory ownership.
If a data unit is not changed in the cache, it is merely overlayed to replace it without any castout, since its backing copy in main memory is identical.
U.S. Pat. No. 4,394,731 to Flusche et al teaches the use of an exclusive/readonly (EX/RO) flag in each entry in each private processor store-in cache directory for data coherence control in a computer system. A copy directory was provided for each processor's private L1 directory to identify the respective processor ownership of all data units currently in its cache, and the set of all processor copy directories was used to recognize which processor owned, or was publicly using, a data unit being requested exclusively by another processor in the system. Cross-interrogation was the process used among the copy directories to identify which, if any, processor had exclusive or public ownership of any data unit, which was done by comparing the address of a requested data unit with addresses in all copy directories. If the requested address was found in a copy directory, it identified a processor cache having that data unit. And cross-invalidation (XI) signalling was done from the identified processor's copy directory to its L1 cache to invalidate the entry for that data unit before passing the ownership of the data unit to another processor's cache.
This XI process assured exclusivity of a data unit to only one processor at a time by invalidating any copy of the data unit found in any other processor's private cache.
Hence, only one of the plural processors in a multiprocessing (MP) system can have exclusive ownership (write authority) at any one time over any data unit. The exclusive ownership over any data unit may be changed from one processor to another when a different processor requests exclusive ownership. The prior mechanism for indicating exclusive ownership for a processor was to provide an exclusive (EX) flag bit in each L1 directory entry in a processor's private L1 cache; and the EX bit was set on to indicate which of the associated data units were "owned" by that processor. The reset state of the EX flag bit indicated public ownership, which was called "readonly authority" for the associated data unit that made it simultaneously available to all processors in the system. Thus each valid data unit in any processor's private L1 cache had either exclusive ownership or public ownership.
The copy-directory XI technique of prior U.S. Pat. No. 4,394,731 automatically assigned the following ownership to a data unit fetched from main storage into a processor's private L1 store-in cache:
1. EX ownership when the data unit is not found in any processor's copy directory.
2. EX ownership when the data unit is found changed with EX ownership in another processor's copy directory. The requested data unit is castout of the other processor's cache before it is fetched into the requesting processor's cache.
3. RO ownership when the data unit is found not changed with EX ownership in another processor's copy directory, and the new request is deemed not likely to change the data unit (fetch request). Also, the found data unit is left in its cache where its ownership is changed from EX to RO.
4. EX ownership when the data unit is found with RO ownership in one or more other processor's copy directories, and the new request is deemed to likely change the data unit (store interrogate request). The found data unit is invalidated in the other processor's cache. This XI operation uses a time-consuming process called "promote to exclusive".
5. RO ownership when the data unit is found with RO ownership in another processor's copy directory. Also, the found data unit is left in its processor's cache with its RO ownership.
6. RO ownership when the data unit is a page table entry found with RO public ownership set in the entry, regardless of the type of processor request.
Patent application Ser. No. 07/680,176 filed on the same date as the subject application and assigned to the same assignee describes and claims an ownership interlock control for cache data units. It interlocks a change of ownership for an exclusively-owned data unit in a store-in cache with the completion of all stores to the data unit issued by its processor up to the time it responds to a received cross-invalidate (XI) signal caused by another processor requesting the data unit either exclusively or with public ownership.