A large proportion of the operations performed by a CPU in running a program involve retrieving data from and storing data in main memory. In multiprocessor systems where several CPU are all sharing a common memory it often happens that one or more processors are forced to wait for access to main memory while another processor has the memory tied up with a read or write operation. It became apparent that overall processing speed could be improved if each processor had its own private cache of data from which it could draw information.
Thus was born the cache memory. Early versions such as described in U.S. Pat. No. 3,866,183 assigned to the assignee of the present invention adopted a "look aside" configuration in which a memory access cycle was started while the cache was checked simultaneously. Then, if the desired data was present in the cache, the memory cycle was aborted and the data was retrieved from the cache. No time was lost in this fashion in searching the cache prior to starting a memory access. The operations of the cache were called "invisible" to the CPU.
A later version described in U.S. Pat. No. 3,845,474, also assigned to the present assignee, taught a cache with clearing apparatus for use in multiprocessor systems. This cache cleared itself entirely everytime its processor entered the common operating system module shared by all the processors. Also, the entire cache was cleared each time the processor serviced an external interrupt such as when data was brought into main memory from backing store. This arrangement caused excessive clearing of the cache and slowed operations of the processor by necessitating more accesses to main memory because of a lower "hit" ratio. That is, the probability of finding the desired data in the cache was lower because of the frequency of total clearing of the cache. This reference also describes a "round robin" counter which controls cache store write operations. The first piece of data written into the cache will be stored in a first section of a particular block of the cache store, and its associated tag address bits will be stored in the corresponding first level of the corresponding column of the directory. Thereafter, each succeeding incoming piece of data will be stored in the next sequential location. Thus for a four level cache, incoming data will be stored in sections 1, 2, 3, 4, 1 . . . As is readily apparent, this scheme takes no account of the relative frequency of usage of the four levels such that the most frequently used data word in a particular row of blocks could be displaced by an incoming data word after a cache miss.
More recently a means of eliminating unnecessary clearing of the cache has been described in a patent application entitled Apparatus for Selectively Clearing a Cache Store, Ser. No. 968,223 filed on Dec. 11, 1978, now abandoned, and assigned to the present assignee. This application described an apparatus which used a duplicate directory to compare the tag addresses of the data in the cache to the tag addresses of data in main memory which had been changed by another processor in the system. If a match was found, the location of the obsolete data in cache was marked as empty. This scheme represented an advance by elimination of much unnecessary clearing of cache and, to that extent, increased the speed of operation. However, the round robin scheme of making new entries into the cache was used in this reference so the problem of displacing often used data from cache still remained unsolved.
An apparatus for hierarchical storage of data fetched from memory upon a cache miss is described in U.S. Pat. No. 3,967,247. A least recently used scheme for displacing data from the cache upon retrieval from main memory after a cache miss is described there. A two bit age tag is assigned to each of the four levels of the cache for indicating the relative times of last reference to each piece of data. After each match a network of comparators examines the level matched and the age bits from each level. The network then updates the age bits for each level using a network of adders. This algorithm is different from that of the present invention and the implementing hardware is more complicated, slower and more costly.
Thus it was that a need existed for cache memory suitable for use in a multiprocessor system which would not be completely cleared each time another processor changed a data word in main memory, and which would only displace the least recently used item in a column of data after a cache miss.