Historically, the performance of computer systems has been directly linked to the efficiency by which data can be accessed from memory, often referred to as the memory access time. Generally, the performance of a central processing unit (CPU or microprocessor), which functions at a high speed, has been hindered by slow memory access times. Therefore, to expedite the access to main memory data, cache memories have been developed for storing frequently used information.
A cache is a relatively small high-speed memory that is used to hold the contents of the most recently utilized blocks of main storage. A cache bridges the gap between fast processor cycle time and slow memory access time. Using this very fast memory, the microprocessor can reduce the number of wait states that are interposed during memory accesses. When the processor issues a read instructions to the cache, the cache checks its contents to determine if the data is present. If the data is already present in the cache (termed a “hit”), the data is forwarded to the CPU with practically no wait. If, however, the data is not present (termed a “miss”), the cache must retrieve the data from a slower, secondary memory source, which may be the main memory or another cache, in a multi-level cache hierarchy. In addition, the retrieved information is also copied (i.e. stored) into the cache memory so that it is readily available to the microprocessor for future use.
Most cache memories have a similar physical structure. Caches generally have two major subsystems, a tag subsystem (also referred to as a cache tag array) and memory subsystem (also known as cache data array). A tag subsystem holds address information and determines if there is a match for a requested datum, and a memory subsystem stores and delivers the data upon request. Thus, typically, each tag entry is associated with a data array entry, where each tag entry stores an upper portion of the address relating to each data array entry. Some data processing systems have several cache memories in a multi-level cache hierarchy, in which case each data array will have a corresponding tag array to store addresses.
To speed up memory access operations, caches rely on principles of temporal and special locality. These principles of locality are based on the assumption that, in general, a computer program accesses only a relatively small portion of the information available in computer memory in a given period of time. In particular, temporal locality holds that if some information is accessed once, it is likely to be accessed again soon, and spatial locality holds that if one memory location is accessed then other nearby memory locations are also likely to be accessed. Thus, in order to exploit temporal locality, caches temporarily store information from a slower-level memory the first time it is accessed so that if it is accessed again soon it need not be retrieved from the slower-level memory. To exploit spatial locality, caches transfer several blocks of data from contiguous addresses in slower-level memory, besides the requested block of data, each time data is written in the cache from slower-level memory.
Utilizing a multi-level cache memory hierarchy can generally improve the proficiency of a central processing unit. In a multi-level cache infrastructure, a series of caches L0, L1, L2 or more can be linked together, where each cache is accessed serially by the microprocessor. For example, in a three-level cache system, the microprocessor will first access the fast L0 cache for data, and in case of a miss, it will access slower cache L1. If L1 does not contain the data, it will access the slower but larger L2 cache before accessing the main memory. Since caches are typically smaller and faster than the main memory, the general trend is to design computer systems using a multi-level cache hierarchy.
In a multilevel cache system various latencies are typically incurred when a miss occurs due to the time it takes to provide new data to each level of the multilevel hierarchy.