Processing system processors typically utilize cache memories for fast access to data stored in a main memory. When such a processor requests data from the main memory, the requested data is delivered to a cache memory and then delivered to the processor from the cache memory. When the processor issues a subsequent request for the same data, the processing system first checks cache memory. If the requested data resides in cache, a cache “hit” occurs, and the data is delivered to the processor from the cache. If the data is not resident in cache, a cache “miss” occurs, and the data is retrieved from main memory. Frequently utilized data thus tends to be retrieved more rapidly than less frequently requested data. Storage of frequently used data in cache tends to reduce overall data access latency, i.e. time between a processor request for data and delivery of the data to the processor.
Processing system designers have used the concept of cache hierarchy to enhance system performance over a wide variety of applications. A cache hierarchy typically includes a fast but small primary cache at the lowest level of the hierarchy. Upper-level caches typically are used to hold data accessed less frequently than data kept in the primary cache. Thus levels of cache generally are arranged in order of decreasing speed and increasing size. When a cache miss occurs at the primary cache level, the processing system checks the upper cache level(s) for the requested data before accessing the data from main memory. Levels of a cache hierarchy typically are searched in a fixed sequence, from lowest to highest. Although searches for data that can be satisfied out of primary cache generate relatively minimal latencies, latency is increased as each level is searched in its turn.