Computer systems may include multiple levels of memories arranged in a hierarchical manner. For example, a computer may include a main random access memory (RAM), and a central processing unit (CPU) which includes one or more levels of cache memory. Each level of cache is generally faster and smaller than the level above it, but more expensive, either in terms of dollars or silicon area. Caches are provided to enhance the execution speed of an executing program in a manner transparent to the program. Each cache is used to hold valid data logically viewed by the program 1o reside in RAM, for faster access by the CPU. Such a system may include a level one (L1) cache which may advantageously be located within the CPU, and may further include a level two (L2) cache which may be located within the CPU as an on-chip or off-chip memory, and may further include a level three (L3) cache which would more typically be located on the motherboard. Other configurations are, of course, possible.
Because the same logical data are represented at physically distinct locations in perhaps one or more levels of cache, and because a given logical datum may be written to those various levels at distinct times, there is a need for maintaining logical coherency or consistency between the levels of memory.
Advanced computer systems may include a plurality, of agents which are capable of reading and/or writing to memory. This complicates the cache consistency requirement. Such agents may include various entities within a single CPU. They may also include a plurality of CPUs in a single system. Or, they may include other types of agents such as direct memory access (DMA) controllers or the like.
In such a system, the various caches may be coupled to various combinations of buses. It is desirable that the various agents access the caches over these bus(es) in a non-blocking manner, to enhance system performance.
What is desired is an improved such system and method for maintaining cache consistency for non-blocking, multi-level caching. This is particularly desirable in a system which has any or all of the following characteristics: out-of-order instruction processing, speculative execution of instructions especially those involving bus requests, deep bus pipelining, support of self-modifying code, code and data cache fetching independence, L2 cache integration, and multi-level internal caching.