This disclosure relates generally to the field of computer hardware, particularly to a cache in a processor of a computer, and more particularly to shadow registers for storage of least recently used (LRU) data in a cache.
Processor performance has been increasing rapidly from year to year, while memory access times have been improving more slowly. As a result, the latency of cache misses in processor cycles is increasing rapidly. Additionally, due to the increase in the required data bandwidth to support higher degrees of instruction-level parallelism, cache miss latencies are becoming a larger fraction of overall processor performance. Therefore, various attempts have been made to reduce and tolerate cache miss latency.
The cache is used by the central processing unit (CPU) of a computer system to reduce the average time to access memory. The cache is a relatively small, fast memory local to the CPU that stores copies of data from the most frequently accessed main memory locations. A CPU may include various types of local caches, such as an instruction cache and a data cache, and may also include various levels of caches, such as a level-2 (L2) cache and a level-3 (L3) cache. As long as most memory accesses are made within the cache, the average latency of memory accesses will be closer to the cache latency than to the latency of the main memory. A cache may include three local memory arrays: a tag array, a least recently used (LRU, or LRU/Valid) array, and a data array. When the CPU needs to read from or write to a memory address in the main memory, the CPU first checks whether an entry corresponding to a copy of the data from the address in the main memory is currently held in the data array of the cache by checking the tag array, and simultaneously checks the LRU array. If there is a cache hit, and the processor immediately reads from or writes to the entry corresponding to the requested address in the data array, which is faster than reading from or writing to the main memory. The LRU array is also updated simultaneously with the data array read/write. The cache therefore speeds up fetches by avoiding accessing the memory external to the cache.
In the event of a cache miss, in which an entry corresponding to a copy of the data from the address in the main memory is not currently held in the data array of the cache, the CPU must locate the address in the main memory of the computing system. This may be a relatively slow process. When the data is retrieved from the address in the main memory, the data is written into temporary storage, referred to as a line fill buffer, until the cache is ready to receive the retrieved data into the data array and update the tag and LRU arrays. The LRU array, which keeps a record of which entry in the data array is the least recently used so that it may be overwritten with the newly retrieved data, is checked again to determine which entry in the data array to overwrite before writing the contents of the line fill buffer into the data array, and simultaneously updating the tag and LRU arrays. However, checking the LRU unit a second time after retrieving the data from the address in main memory requires an additional array access cycle, which increases the total time needed to process a cache miss.