The microprocessors used in current personal computers operate upon data at very high speeds. This is particularly true for superscalar microprocessors that can operate on more than one instruction at a time. It is not economically feasible to construct the entire computer memory system to operate at the same rate as the microprocessor. Further, it is not necessary to construct such a memory system. Microprocessors employ data or instruction caches based upon an assumption of locality. Having once referenced particular data or to a particular instruction from main memory, it is normally the case that nearby data or instructions will be referenced again in the near future. It is feasible to construct a small and fast memory to temporarily store such data or instructions. This small fast memory is called a cache. It is typical to recall data from the main memory in minimum sizes larger than the minimum addressable memory size. Such memory recalls may be via a data bus wider than the minimum addressable data size or via bursts of plural memory accesses or both. Such recall of adjacent data also serves the locality assumption by recalling from nearby addresses that are likely to be referenced in the near future. Memory caches store their data with an indication of the corresponding main memory address.
Each memory reference by the microprocessor is tested against these cache address indications to determine if the referenced address is cached. If the referenced address is stored in the cache, called a cache hit, then the memory access takes place within the cache rather than the main memory. Since memory access to the cache is faster than access to the main memory, each cache hit represents a gain in memory access speed. Note that such memory accesses may be made for both reads of the memory and writes to the memory. If the access is a write, this write takes place in the copy of the data stored in the cache. This cache entry is then marked as dirty. This means that it contains data that differs from the data in the corresponding address in the main memory and the cache data corresponds to the state called for in the current program. If the referenced address is not stored in the cache, called a cache miss, then the main memory must be accessed. In a read access, the microprocessor operation unit needing the data must stall until the data is returned from the slower main memory. When recalled, this main memory data is both supplied to the requesting microprocessor operation unit and stored in the cache. Some microprocessors do not cache data on a cache miss for a write access. These microprocessors merely write to the main memory. In superscalar microprocessors it is likely that other useful tasks may be preformed while this write to main memory takes place. Other microprocessors employ a cache write allocation policy by recalling data on a cache miss for a write access. The data at the memory address to be written is recalled and stored in the cache. The memory write then takes place into the corresponding cache location. This cache entry is marked dirty indicating that the cache entry differs from the copy in the main memory. A write allocation policy is based upon the assumption that the memory location will need to be accessed again a short time for a read or a write following a cache miss write access. If this is true, then the subsequent accesses take place within the cache and increase the speed of memory access.
Whatever the size of the memory cache, the microprocessor will eventually fill it. Upon the next following cache miss, a cache entry must be cleared to enable the missed data from the main memory to be stored. Memory caches typically employ a least recently used algorithm. Along with the corresponding memory address and an indication of whether the cache entry is dirty, the memory cache must store an indication of the last use of the cache entry. The cache entry to be replaced is the least recently used cache entry. This is based upon the assumption that the least recently used cache entry is the least likely to be needed again in the near future.
This cache entry replacement process begins with a cache miss because the requested data is not in the cache. The cache controller must determine which cache entries to replace with the newly required data. If the least recently used cache entry is clean, which is corresponds exactly to the data in the main memory, then this cache entry is overwritten. Because the cache entry is the same as the data stored in the corresponding location in memory, this overwrite does not lose the program state. If the least recently used cache entry is dirty, then the cache entry holds data different from the corresponding memory location. In this case the cache entry holds the program state and overwriting this data would be improper. This cache entry must be evicted, that is, it must be written out to the main memory before the cache entry may be reused.
The need to evict dirty cache entries may cause the microprocessor to stall. This situation occurs only on a cache miss that typically indicates that a microprocessor execution unit requires the data. This situation may occur either on a memory read or on a memory write when using a write allocate policy. However, the requested data from the main memory cannot be stored until the dirty cache entry is written to memory. Thus the microprocessor requires new data but must wait for old data to be written to memory before the new data can be recalled from the memory. It is known in the art to provide a write-back buffer to deal with this problem. The write-back buffer is a first-in-first-out buffer of cache entries that are scheduled to be written to memory. Each entry in the write-back buffer includes the cache entry data and the corresponding main memory address. When the memory bus is free, the data from the last entry in the write-back buffer is written to the main memory at the corresponding address. Upon completion of this write to main memory, a write-back buffer entry is freed to store another evicted cache entry. The write-back buffer enables the required memory write to be delayed until after the memory read. Because a microprocessor operation unit is waiting for the memory read to complete, the advancement of the read before the write permits microprocessor operation to continue past the cache entry eviction.
There are occasions when the assumption of locality of reference fails. One of these times is a context switch when the microprocessor changes from its current task to another task. This may occur when loading a new program, changing between parts of a single program or servicing an interrupt. In these cases memory operation shifts from an original address block to another address block. Such a shift in memory reference requires a large amount of data from the main memory to be cached in a relatively short period. This requires replacement of a large portion of contents of the memory cache. Thus during a context switch large numbers of dirty cache entries may need to be evicted. A write-back buffer only delays the need for writing to memory. Once the write-back buffer is full, then a write-back of a dirty cache entry must occur before any required memory read. In a context switch this situation often occurs repeatedly, slowing the operation of the microprocessor each time.
Increasing the depth of the write-back buffer FIFO tends to reduce this problem. With sufficient depth the write-backs may be delayed until the new task begins to reference cached data. Each time the new task generates a cache hit, a memory bus cycle is not needed to recall the requested data. This frees a memory bus cycle for a write-back from the write-back buffer. The larger the write-back buffer the more likely that the write-backs will be delayed until their required memory bus cycles can be hidden behind cache hits. However, the write-back buffer requires relatively large amounts of area in the integrated circuit embodying the microprocessor. Each entry in the write-back buffer must include an entire cache entry, which may be from 64 bits to 256 bits or more wide. Each entry in the write-back buffer also needs the address of the beginning of the data, which is often 27 to 29 bits. In addition, the address of each entry in the write-back buffer is typically compared with the address of any memory access. The data write-back is generally aborted if the address of a memory access matches any address within the write-back buffer. A match on a memory read means that data within the write-back buffer is needed by an operational unit of the microprocessor. It saves time to obtain this data from the write-back buffer rather than writing it to main memory, generating a cache miss on the read access and then reading it back into the cache. A match on a memory write means that the data within the write-back buffer is to be altered. Thus the data in the write-back buffer will be the wrong data to write to the memory. In addition in either case, an access to data in the write-back buffer means that this data is no longer the least recently used. Thus another cache entry should be replaced rather than the cache entry in the write-back buffer. Because the write-back buffer may stall the microprocessor execution unit needing data, all these compares should be completed in time to allow the clearing of a write-back buffer entry quickly upon a match. This requires lots of parallel hardware for the compares. Because of these circuit complexities, the depth of the write-back buffer is typically set to prevent most microprocessor operation unit stalls during ordinary processing but not during context switches.