Most modern computer systems include a central processing unit (CPU) and a main memory. The speed at which the CPU can decode and execute instructions to process data has for some time exceeded the speed at which instructions and operands can be transferred from main memory to the CPU. In an attempt to reduce the problems caused by this mismatch, many computers include a cache memory or buffer between the CPU and main memory.
Cache memories are small, high-speed buffer memories used to hold temporarily those portions of the contents of main memory which are believed to be currently in use by the CPU. The main purpose of caches is to shorten the time necessary to perform memory accesses, either for data or instruction fetch. Information located in cache memory may be accessed in much less time than that located in main memory. Thus, a CPU with a cache memory needs to spend far less time waiting for instructions and operands to be fetched and/or stored. For such machines the cache memory produces a very substantial increase in execution speed.
A cache is made up of many blocks of one or more words of data, each of which is associated with an address tag that uniquely identifies which block of main memory it is a copy of. Each time the processor makes a memory reference, the cache checks to see if it has a copy of the requested data. If it does, it supplies the data; otherwise, it gets the block from main memory, replacing one of the blocks stored in the cache, then supplies the data to the processor. See, Smith, A. J., Cache Memories, ACM Computing Surveys, 14:3 (Sept. 1982), pp. 473-530.
Optimizing the design of a cache memory generally has four aspects:
(1) Maximizing the probability of finding a memory reference's target in the cache (the hit ratio),
(2) minimizing the time to access information that is indeed in the cache (access time),
(3) minimizing the delay due to a miss, and
(4) minimizing the overheads of updating main memory and maintaining multicache consistency.
All of these objectives are to be accomplished under suitable cost constraints and in view of the inter-relationship between the parameters.
When the CPU executes instructions that modify the contents of the current address space, those changes must eventually be reflected in main memory; the cache is only a temporary buffer. There are two general approaches to updating main memory: stores can be transmitted directly to main memory (referred to as write-through or store-through), or stores can initially modify the data stored in the cache, and can later be reflected in main memory (copy-back or write-to). The choice between write-through and copy-back strategies also has implications in the choice of a method for maintaining consistency among the multiple cache memories in a tightly coupled multiprocessor system.
A major disadvantage to the write-through approach is that write-through requires a main memory access on every store. This adds significantly to the relatively slow main memory traffic load which slows the execution rate of the processor and which the cache is intended to minimize. However, when write-through is not used, the problem of cache consistency arises because main memory does not always contain an up-to-date copy of all the information in the system.
Input and output between the main memory and peripheral devices is an additional source of references to the information in main memory which must be harmonized with the operation of cache memories. It is important that an output request stream reference the most current values for the information transferred. Similarly, it is also important that input data be immediately reflected in any and all copies of those lines in memory.
There have been several approaches to solving this problem. One is to direct the I/O stream through the cache itself. This method is limited to single processor systems. Further, it interferes significantly with the processor's use of the cache, both by keeping the cache busy when the processor needs it and by displacing blocks of information currently being used by the processor with the blocks from the I/O stream. Thus it degrades both the cache access time and the hit rate. An alternate approach is to use a write-through policy and broadcast all writes so as to update or invalidate the target line wherever found. Although this method accesses main memory instead of the cache, it suffers from the disadvantages of the write-through strategy discussed above. In addition, this hardware intensive solution is expensive to implement and increases the cache access cycle time by requiring the cache to check for invalidation. This is particularly disadvantageous in multiprocessor systems because every cache memory in the system can be forced to surrender a cycle to invalidation lookup whenever any processor in the system performs a store.
Another alternative is to implement a directory to keep track of the location and status of all copies of each block of data. The directory can be centralized in main memory or distributed among the caches, I/O channels and main memory. This system insures that at any time only one processor or I/O channel is capable of modifying any block of data. See, Tang, C. K., Cache Design in the Tightly Coupled Multiprocessor System, AFPIS Proc., N.C.C., vol. 45, pp. 749-53 (1976). The major disadvantage of the directory control system is the complexity and expense of the additional hardware it requires.
Finally, if a processor fails, for instance because of a power interruption, the memory system must assure that the most current copies of information are stored in main memory, so that recovery can be more easily accomplished.
It is an object of this invention to provide a system for maintaining the memory integrity and consistency in a computer system having cache memories, placing the burden of maintaining integrity on the software, thus allowing the hardware to remain relatively simple, cheap and fast.
It is also an object of this invention to minimize the impact of the overhead for maintaining memory integrity and consistency on the operation of the cache memories, so that the cache access time and miss ratio can be minimized.
These and other objects of the invention are accomplished in a computer having an instruction set including explicit instructions for controlling the invalidation or removal of blocks of data in the cache memories. Each block of data stored in the caches has two one-bit status flags, a valid bit to indicate whether the block contains up-to-date information, and a dirty bit to indicate whether the data in the block has been stored to by the processor since it entered the cache. The instruction set includes instructions for removing a block with a particular address from the cache and writing it back to memory if necessary, for removing a block without writeback to main memory, for suspending execution of instructions until pending cache control operations are completed, and for efficiently removing and writing back to main memory all "dirty" blocks in the cache in case of a processor failure. The operating system software invokes these instructions in situations which could result in inconsistent or stale data in the cache memories.