1. Field of the Invention
This invention relates generally to processor-based systems, and, more particularly, to concurrent flushing of the multiple caches.
2. Description of the Related Art
Many processing devices utilize caches to reduce the average time required to access information stored in a memory. A cache is a smaller and faster memory that stores copies of instructions and/or data that are expected to be used relatively frequently. For example, central processing units (CPUs) are generally associated with a cache or a hierarchy of cache memory elements. Processors other than CPUs, such as, for example, graphics processing units and others, are also known to use caches. Instructions or data that are expected to be used by the CPU are moved from (relatively large and slow) main memory into the cache. When the CPU needs to read or write a location in the main memory, it first checks to see whether the desired memory location is included in the cache memory. If this location is included in the cache (a cache hit), then the CPU can perform the read or write operation on the copy in the cache memory location. If this location is not included in the cache (a cache miss), then the CPU needs to access the information stored in the main memory and, in some cases, the information can be copied from the main memory and added to the cache. Proper configuration and operation of the cache can reduce the latency of memory accesses below the latency of the main memory to a value close to the value of the cache memory.
One widely used architecture for a CPU cache memory is a hierarchical cache that divides the cache into two levels known as the L1 cache and the L2 cache. The L1 cache is typically a smaller and faster memory than the L2 cache, which is smaller and faster than the main memory. The CPU first attempts to locate needed memory locations in the L1 cache and then proceeds to look successively in the L2 cache and the main memory when it is unable to find the memory location in the cache. The L1 cache can be further subdivided into separate L1 caches for storing instructions (L1-I) and data (L1-D). The L1-I cache can be placed near entities that require more frequent access to instructions than data, whereas the L1-D can be placed closer to entities that require more frequent access to data than instructions. The L2 cache is typically associated with both the L1-I and L1-D caches and can store copies of instructions or data that are retrieved from the main memory. Frequently used instructions are copied from the L2 cache into the L1-I cache and frequently used data can be copied from the L2 cache into the L1-D cache. The L2 cache is therefore referred to as a unified cache.
Caches are typically flushed prior to powering down the CPU. Flushing includes writing back modified or “dirty” cache lines to the main memory and invalidating all of the lines in the cache. Microcode can be used to sequentially flush different cache elements in the CPU cache. For example, in conventional processors that include an integrated L2 cache, microcode first flushes the L1 cache by writing dirty cache lines into main memory. Once flushing of the L1 cache is complete, the microcode flushes the L2 cache by writing dirty cache lines into the main memory.