Most modern computer systems include a central processing unit (CPU) and a main memory. The speed at which the CPU can decode and execute instructions to process data has for some time exceeded the speed at which instructions and operands can be transferred from main memory to the CPU. In an attempt to reduce the problems caused by this mismatch, many computer systems include a cache memory buffer between the CPU and main memory.
A cache memory is a small, high-speed buffer memory which is used to hold temporarily those portions of the contents of main memory which it is believed will be used in the near future by the CPU. The main purpose of a cache memory is to shorten the time necessary to perform memory accesses, either for data or instruction fetch. The information located in cache memory may be accessed in much less time than that located in main memory. Thus, a CPU with a cache memory needs to spend far less time waiting for instructions and operands to be fetched and/or stored. For example, in typical large, high-speed computers, main memory can be accessed in 300 to 600 nanoseconds; information can be obtained from a cache memory on the other hand, in 50 to 100 nanoseconds. For such machines, the cache memory produces a very substantial increase in execution speed, but processor performance is still limited in instruction execution rate by cache memory access time. Additional increases in instruction execution rate can be gained by further decreasing the cache memory access time.
A cache memory is made up of many blocks of one or more words of data. Each block has associated with it an address tag that uniquely identifies which block of main memory it is a copy of. Each time the processor makes a memory reference, the cache makes an address tag comparison to see if it has a copy of the requested data. If it does, it supplies the data. If it does not, it retrieves the block from main memory to replace one of the blocks stored in the cache, then supplies the data to the processor. PG,5
Optimizing the design of a cache memory generally has four aspects:
(1) Maximixing the probability of finding a memory reference's information in the cache (the so-called "hit" ratio), PA0 (2) minimizing the time required to access information that is indeed in the cache (access time), PA0 (3) minimizing the delay due to a cache "miss", and PA0 (4) minimizing the overhead of updating main memory and maintaining multicache consistency.
All of these objectives must be accomplished under cost constraints and in view of the interrelationship between the parameters, for example, the trade-off between "hit" ratio and access time.
It is obvious that the larger the cache memory, the higher the probability of finding the needed information in it. Cache sizes cannot be expanded without limit, however, for several reasons: cost, the most important reason in many machines, especially small ones; physical size, the cache must fit on the boards and in the cabinets; and access time, the larger the cache, the slower it will become.
Information is generally retrieved from cache associatively to determine if there is a "hit". However large associative memories are both very expensive and somewhat slow. In early cache memories, all the elements were searched associatively for each request by the CPU. In order to provide the access time required to keep up with the CPU, cache sizes was limited and the hit ratio was thus rather low.
More recently, cache memories have been organized into groups of smaller associative memories called sets. Each set contains a number of locations, referred to as the set size. For a cache of size m, divided into L sets, there are s=m/L locations in each set. When an address in main memory is mapped into the cache, it can appear in any of the L sets. For a cache of a given size, searching each of the sets in parallel can improve access time by a factor of L. However, the time to complete the required associative search is still undesirably lengthy.
The operation of cache memories to date has been based upon the assumption that, because a particular memory location has been referenced, that location and locations very close to it are very likely to be accessed in the near future. This is often referred to as the property of locality. The property of locality has two aspects, temporal and spatial. While over short periods of time, a program distributes its memory references nonuniformly over its address space, the portions of the address space which are favored remain largely the same for long periods of time. This first property, called temporal locality, or locality by time, means that the information which will be in use in the near future is likely to be in use already. This type of behavior can be expected from certain data structures, such as program loops, in which both data and instructions are reused. The second property, locality by space, means that portions of the address space which are in use generally consist of a fairly small number of individually contiguous segments of that address space. Locality by space, then, means that the loci of reference of the program in the near future are likely to be near the current loci of reference. This type of behavior can be expected from common knowledge of program structure: related data items (variables, arrays) are usually stored together, and instruction are mostly executed sequentially. Since the cache memory retains segments of information that have been recently used, the property of locality implies that needed information is also likely to be found in the cache. See, Smith, A. J., Cache Memories, ACM Computing Surveys, 14:3 (Sept. 1982), pp. 473-530.
If a cache has more than one set, as described above, then when there is a cache miss, the cache must decide which of several blocks of information should be swapped out to make room for the new block being retrieved from main memory. To decide when block will be swapped out, different caches use different replacement schemes.
The most commonly utilized replacement scheme is Least Recently Used ("LRU"). According to the LRU replacement scheme, for each group of blocks at a particular index, the cache mantains several status bits that keep track of the order in which these blocks were last accessed. Each time one of the blocks is accessed, it is marked most recently used and the others are adjusted accordingly. When there is a cache miss, the block swapped out to make room for the block being retrieved from main memory is the block that was least recently used.
Other replacement schemes that are used are First In First Out (FIFO) and random replacement, the nonenclature being self-explanatory.
Contrary to the above-stated assumption, however, not all computer data structures have the same kind of data locality. For some simple structures such as data stacks or sequential data, an LRU replacement scheme is not optimal. Thus, in cache memory structures used in the past and in accordance with the basic assumption that the most likely data to be referenced is that which was referenced most recently or is close to that data in physical address, no provision has been made in cache memory operation for deviation from the standard data replacement scheme.