1. Field of the Invention
The invention relates generally to caches in a multiprocessor system and more particularly to the use of an additional change bit in a cache directory to indicate whether a changed line in the cache was changed by that cache or another.
2. Description of the Prior Art
Modern high performance stored program digital computers conventionally fetch instructions and data from main memory and store the fetched instructions and data in a cache memory. A cache is a local memory that is typically much smaller and much faster than the main memory of the computer. Virtually all high performance digital computers use a cache and even some commercially available microprocessors have local caches.
Caches have been developed because it has not been possible to build extremely large memories at a reasonable cost that operate having an access time commensurate with modern day pipelined processors. It is however possible to build inexpensive, small memories that can keep up with the processor. Since an instruction in the cache can be immediately accessed by the processor, caches have been used to speed up computer performance.
It has been observed that items (either instruction or data) once referred tend to be referred to again in the near future. This property is known as the "Temporal Locality of Reference" and it is a rationale for keeping the most recently referenced items in the cache. It has also been observed that if an item is referenced, then other items that are physically close to the reference item are also likely to be referenced. This second property is known as "Spatial Locality of Reference" and it is a rationale for keeping cache lines that are blocks of continuous items.
Caches can be used in both multiprocessor and uniprocessor systems. In the type of multiprocessor (MP) system known as the tightly coupled multiprocessor system in which several processors (CP) have their own caches that share a common operating system and memory, there are additional problems since it is necessary for each cache to know what has happened to lines which may be in several caches simultaneously. In a multiprocessor system where there are many CPs sharing the same main storage, each CP is required to obtain the most recently updated version of data according to architecture specifications when access is issued. This requirement necessitates constant monitoring of possible data consistencies among caches, often known as the cache coherence problem.
There are various types of caches in prior art multiprocessor systems. One type of cache is the store through (ST) cache which does not interfere with the CP storing data directly to the main storage (or second level cache) in order to always update changes of data to main storage. Upon the update of a store through to main storage appropriate cross interrogate (XI) actions may take place to invalidate the copies of a cache line located in caches other than the one at the CP that initiated the store. Usually store through cache designs require substantial main storage bandwidths to incorporate the data stores.
Another type of cache design is the store-in cache (SIC). SICs are described in U.S. Pat. Nos. 3,735, 360 to Anderson et al. and 3,771,137 to Warner et al A SIC cache directory is described in detail in U.S. Pat. No. 4,394,731 to Flusche et al. in which each line in a store-in cache has its multiprocessor shareability controlled by an exclusive/read only (EX/RO) flag bit. The main difference between ST and SIC caches is that, all stores in SIC are directed to the cache itself (which may cause a cache miss if the stored line is not in the SIC cache). In a store-in cache design data transfers upon a misfetch can take place through a cache to cache transfer bus (CTC) if a copy is in the remote cache. A storage control element is used which contains copies of the directories in each cache. This permits cross interrogate (XI) decisions to be resolved fairly efficiently. Usually cache line modifications are updated to main storage only when the lines are replaced from the cache.
Lines in a cache are typically replaced in accordance with a replacement algorithm which usually ages out a least recently used line. In a store-in cache design, when such a modified/changed cache line ages out it is also written to memory. As a result in the store-in cache design main storage bandwidth is reduced at the expense of more complex coherence control and the penalties arising from cross interrogate castouts. Cross interrogate castouts occur when a data access from one CP finds a line modified in the cache of another CP.
The cache directory contains information as to whether the line is read only (RO). exclusive (EX). changed (CH) or invalid (INV).
A cache line that is RO is valid only in a read only state. The processor can only fetch from the line. Stores into the line are prohibited. The cache line may be shared simultaneously among different caches.
A cache line that is EX is valid but only appears in the cache of one processor. It is not resident in any other (remote) cache. The one (owning) processor is allowed to store into the line. A cache line that is CH indicates that not only is the line valid and EX but that it has been stored into. That is the copy in main storage may not be up to date. When a CH line is replaced a copy is sent to main storage via a castout action.
An INV cache line is a line that is invalid.
In a typical computer system a first CP, P.sub.1, may request an instruction or data from a line in a cache. Its own cache will be checked and if the particular line requested is read only (RO) it may make a store request, and via the storage control element (SCE). make that line exclusive (EX). Once the line is made exclusive, the storage control element (SCE) will indicate to the other caches that the line is invalid and the first cache will be free to write into that line. Once that line has been written, a CH-bit, indicating that the line has been changed, is set. Thereafter, if a second processor P.sub.2 requests that line the change bit remains set and even if P.sub.2 does not store into that line the change bit remains set. This occurs because the cache-to-cache bus is used to transfer the line. The reason the CH-bit stays on is that the line never goes through the main storage (i.e. the CH-bit on indicates that the storage is not up to date). As long as the change bit remains set, however, the line must be exclusive to only one cache at any time.
With the above approach concurrency may be lost unnecessarily since a line which could be shared by fetches from different CPUs simultaneously may now be forced to reside in a single cache most of the time. For instance, consider a line L that is more frequently accessed and is modified only relatively occasionally during certain time intervals. Once L is modified by a processor it will mostly stay CH among different caches until it is actually replaced from a cache through LRU replacement. Every time a processor issues a data fetch on line L. a cross interrogate (XI) will be necessary, if L is in a remote cache.
The above anomaly results due to the fact that the CH bit in a directory entry cannot tell whether such a changed line is no longer likely to be modified soon. Such problems often occur, for example, at power on when initial changes are made to the cache, and thereafter all that is usually required is read only (RO) status for the line. The change bit will not be reset until the line ages out and is dumped to main memory.
As caches grow larger and the number of CPU's in a multiprocessor system increase, this problem becomes even greater. With bigger caches lines do not age out as quickly, and as the number of CPU's increases a line tends to be passed around among the CPU's more before getting the chance to age out.
There are a variety of cache management techniques known in the art. There is however no known art which is directed to minimizing the loss of cache concurrency due to limitations of the use of the change bit. The following is representative art in cache control mechanisms.
U.S. Pat. No. 4,464,712 to Fletcher deals with certain strategies of level 2 cache replacements in a two level cache hierarchy (L1/L2). The L1 are the processors private (local) cache and the L2 is shared by all processors at a second level. An R bit and a L2 block entry are used as indicators of whether the block should be subject to replacement priority. Fletcher proposes some methods for manipulating the R bits on DLAT hits/misses. The present invention is independent of the second level cache (L2) hierarchy and is only concerned about the concurrency of data lines in a first level processor local caches.
In U.S. Pat. No. 4,445,174 to Fletcher a performance benefit is claimed on the basis of using a common cache and reducing sharing overhead. The common cache is not used as a second level but rather as a first level cache on top of the processor's private caches. Fletcher '174 proposes deciding whether a line should be moved to the shared cache primarily based upon whether XI is observed and access to the line finds it is CH'd in another processor's private cache. Damaging sharing is detected via a remote CH conditions and the line is then put into a shared cache which is assumed to be fast for all processors. Concurrency in the present invention is achieved by allowing a line in different caches in a read only state when a damaging sharing characteristic disappears.
In U.S. pat. No. 4,181,937 to Hattori et al. an Mp cache replacement scheme in a two level cache hierarchy is taught. Upon the decision of replacement of a block from L2 shared by all processors, blocks with fewer numbers of copies in the first level processor caches are given higher preference. This is supposed to increase concurrency at L1 with better L2 replacement strategies. The present invention is not concerned with L2 replacements.
In U.S. Pat. No. 4,503,497 to Krygowski et al. a cache to cache transfer multiprocessor design is discussed wherein when a CH'd line is accessed by a remote processor, the line is transferred over as CH'd and exclusive copy without accessing main storage. The present invention improves on this design and requires an alteration to the CH bit.
In U.S. Pat. No. 4,394,731 to Flusche et al. in cross interrogate situations (that is a line is remotely EX or CH'd ) the line is fetched as EX (not CH) only when it is found CH'd in the remote cache. The cache to cache transfer environment is not discussed in Flusche et al. The present invention provides a capability to do this kind of EX (but also CH) fetch upon remote CH situations for a cache to cache transfer environment. This has several advantages since when using a cache to cache transfer facility the CH line may be transferred to another cache as EX and CH'directly so that main storage update (castout) traffic is eliminated. Further in this system the condition of the CH line cannot be determined (that is whether the changes due to recent stores or whether the changes was pingponged back and forth with unnecessary loss of concurrency).
Accordingly it is an object of the invention to provide a cache coherency mechanism that accounts for how recently a cache line was changed.
It is another object of the invention to avoid unnecessary restriction on concurrency of a cache line due to its being modified as a remote past event.
It is still another object of the invention to allow cache lines to be read only and be available to all processors if it has not been changed recently.
These and other objects, advantages and features of the invention will be more apparent upon reference to the specification and drawings.