The claimed subject matter relates generally to processor-based systems, and, more particularly, to concurrent access to cache dirty bits.
Many processing devices utilize caches to reduce the average time required to access information stored in a memory. A cache is a smaller and faster memory that stores copies of instructions and/or data that are expected to be used relatively frequently. For example, central processing units (CPUs) are generally associated with a cache or a hierarchy of cache memory elements. Processors other than CPUs, such as, for example, graphics processing units and others, are also known to use caches. Instructions or data that are expected to be used by the CPU are moved from (relatively large and slow) main memory into the cache. When the CPU needs to read or write a location in the main memory, it first checks to see whether the desired memory location is included in the cache memory. If this location is included in the cache (a cache hit), then the CPU can perform the read or write operation on the copy in the cache memory location. If this location is not included in the cache (a cache miss), then the CPU needs to access the information stored in the main memory and, in some cases, the information can be copied from the main memory and added to the cache. Proper configuration and operation of the cache can reduce the latency of memory accesses below the latency of the main memory to a value close to the value of the cache memory.
One widely used architecture for a CPU cache memory is a hierarchical cache that divides the cache into two levels known as the L1 cache and the L2 cache. The L1 cache is typically a smaller and faster memory than the L2 cache, which is smaller and faster than the main memory. The CPU first attempts to locate needed memory locations in the L1 cache and then proceeds to look successively in the L2 cache and the main memory when it is unable to find the memory location in the cache. The L1 cache can be further subdivided into separate L1 caches for storing instructions (L1-I) and data (L1-D). The L1-I cache can be placed near entities that require more frequent access to instructions than data, whereas the L1-D can be placed closer to entities that require more frequent access to data than instructions. The L2 cache is typically associated with both the L1-I and L1-D caches and can store copies of instructions or data that are retrieved from the main memory. Frequently used instructions are copied from the L2 cache into the L1-I cache and frequently used data can be copied from the L2 cache into the L1-D cache. The L2 cache is therefore referred to as a unified cache.
Caches are typically flushed prior to powering down the CPU. Flushing includes writing back modified or “dirty” cache lines to the main memory and invalidating all of the lines in the cache. Microcode can be used to sequentially flush different cache elements in the CPU cache. For example, in conventional processors that include an integrated L2 cache, microcode first flushes the L1 cache by writing dirty cache lines into the L2 cache or main memory. Once flushing of the L1 cache is complete, the microcode flushes the L2 cache by writing dirty cache lines into the main memory. Caches may also be “rinsed” by writing back one or more modified or “dirty” cache lines to the main memory and not invalidating the lines that are written back. Rinsing may be performed in the background and typically writes back a few lines in the cache to make these lines “clean,” but the other cache values are left in their current states.