1. Field of the Invention
This invention generally relates to control of cache memory coherence in a multi-processor (MP) data processing system and, more particularly, to an authorization mechanism that provides each processor precise authorization for accesses, either reads or writes.
2. Description of the Prior Art
High performance, MP computer systems are being developed to increase throughput by performing in parallel those operations which can run concurrently on separate processors. Such high performance, MP computer systems are characterized by multiple central processor (CPs) operating independently and in parallel, but occasionally communicating with one another or with a main storage (MS) when data needs to be exchanged. The CPs and the MS have input/output (I/O) ports which must be connected to exchange data.
In the type of MP system known as the tightly coupled multi-processor system in which each of the CPs have their own caches, there exist coherence problems at various levels of the system. More specifically, inconsistencies can occur between adjacent levels (i.e., first level caches, second level caches, etc.) of a memory hierarchy. The multiple caches could, for example, possess different versions of the same data because one of the CPs has modified its copy. It is therefore necessary for each processor's cache to know what has happened to lines that may be in several caches at the same time. In a MP system where there are many CPs sharing the same main storage, each CP is required to obtain the most recently updated version of data according to architecture specifications when access is issued. This requirement necessitates constant monitoring of data consistency among caches.
A number of solutions have been proposed to the cache coherence problem. Earlier solutions are described by C. K. Tang in "Cache System Design in the Tightly Coupled Multiprocessor System", Proceedings of the AFIPS (1976), and L. M. Censier and P. Feautrier in "A New Solution to Coherence Problems in Multicache Systems", IEEE Transactions on Computers, December 1978, pp. 1112 to 1118. These proposals allow shared writable data to exist in multiple caches which use a centralized global directory. The global directory stores the status of memory blocks so that cache cross-interrogates (XI) can be generated on the basis of the block status. To maintain consistency, XI signals are propagated with associated block addresses to other caches to either invalidate (INV) or update the referenced block. Any number of caches may have read only (RO) copies of a block, but to modify a block in its cache, a processor must have read and write (RW) access. A block is tagged as exclusive (EX) if the cache is the only cache that has a copy of that block. Tang proposed using local cache copy directories as the global directory, which is located in the storage control element (SCE) that may also incorporate the crossbar switch that interconnects the processors and the memory devices. Censier et al. proposed using memory flags for the recording of storage block status instead.
There are various types of caches in prior art MP systems. One type of cache is the store through (ST) cache as described in U.S. Pat. No. 4,142,234 for IBM System/370 Model 3033 MP. ST cache design does not interfere with the CP storing data directly to the main storage (or second level cache) in order to always update changes of data to main storage. Upon the update of a store through to main storage, appropriate cross-interrogate (XI) actions may take place to invalidate possible remote copies of the stored cache line. The storage control element (SCE) maintains proper store stacks to queue the main storage (MS) store requests and standard communications between buffer control element (BCE) and SCE will avoid store stack overflow conditions. When the SCE store stack becomes full, the associated BCE will hold its MS stores until the condition is cleared. In U.S. Pat. No. 4,142,234, Bean et al. proposed a BIAS filter memory mechanism for filtering out unnecessary invalidate interrogations of cache directories.
Another type of cache design is the store-in cache (SIC) as described, for example, in U.S. Pat. Nos. 3,735,360 to Anderson et al. and 4,771,137 to Warner et al. A SIC cache directory is described in detail in U.S. Pat. No. 4,394,731 to Flusche et al. in which each line in a store-in cache has its multi-processor shareability controlled by an exclusive/read only (EX/RO) flag bit. The main difference between ST and SIC caches is that, all stores in SIC are directed to the cache itself (which may cause a cache miss if the stored line is not in the SIC cache). It is also proposed in U.S. Pat. No. 4,503,497 that data transfers upon a miss fetch can take place through a cache-to-cache transfer (CTC) bus if a copy is in the remote cache. A SCE is used that contains copies of the directories in each cache. This permits cross-interrogate (XI) decisions to be resolved at the SCE. Usually, cache line modifications are updated to main storage only when the lines are replaced from the cache.
In conventional cross-interrogate (XI) methods, when a block B is locked EX for CP P.sub.i, any existing cache lines covered by B in remote caches will be invalidated. When the block size is bigger than first level cache line size, this often results in redundant XI-invalidates; i.e., lines get invalidated without being modified at other CPs. The overhead due to extra cache misses and directory handling becomes intolerable when the block size is much (e.g., 16 times) bigger than the line size.
From experiments, significant spatial localities have been observed for both fetches and stores. By performing EX locking with bigger granularity blocks, a significant reduction in the rate of Non-EX-stores can be achieved. As a result, it is valuable to be able to perform large granularity EX locking without causing the above-mentioned performance overhead.
In conventional MP designs, the status (e.g., RO, EX, etc.) of a cache line is recorded using status bits at the directory of the cache. A valid cache line is normally at least readable, if not immediately writable by the CP. For example, consider the cache design of IBM 3081 MP systems. When a valid cache entry is not exclusive (when the EX bit in the associated directory entry is OFF) the line is considered read only (RO) automatically and can be read from the associated CP. There is no known prior art in which all accesses (both read and write) to a valid cache line will need to go through separate authorization mechanisms.