A cache in a processing unit is a smaller, faster, memory used by the processing unit (e.g., the central processing unit (CPU)) of a computer to reduce the average time to access memory. A processing unit utilizes caches to store copies of data from frequently used main memory locations. Many CPUs have different independent caches, including instruction and data caches. Data and instruction caches are generally organized as a hierarchy of more cache levels: level 1 (L1), level 2 (L2), etc. These different caches may work together in a single or multiple processor environment to improve computer performance.
Cache access latency is a major performance contributor to a microprocessor design. The time taken to fetch one cache line from memory, including latency due to a cache miss, affects performance because a CPU will run out of things to do while waiting for the cache line. When a CPU reaches this state, it is called a stall. As CPUs become faster compared to main memory, stalls due to cache misses displace more potential computation. To illustrate this latency, consider that some current CPUs can execute hundreds of instructions in the time taken to fetch a single cache line from main memory.