1. Field of the Invention
The present invention generally relates to microprocessors, and particularly relates to managing microprocessor caches.
2. Relevant Background
Higher performance microprocessors often use a hierarchical memory structure, including a base amount of main memory and one or more higher levels of smaller, faster cache memories in order to more closely match the speed of the memory to the processor speed. For example, Level 1 (L1) caches generally reside on-chip and represent the smallest, fastest cache available to the microprocessor. Level 2 (L2) caches reside on-chip or off-chip, and provide somewhat slower but typically larger amounts of cache memory than an L1 cache for the microprocessor. There may be additional levels of progressively slower (and larger) cache memories between the microprocessor and the main memory.
In operation, cache memory operates as a buffer between the microprocessor and the (comparatively) slow main memory, and is used to hold copies of the instructions and/or data that are most likely needed by the microprocessor. If a copy of a needed instruction or data item resides in the cache, the microprocessor reads/writes that copy instead of accessing the main memory and thereby avoids the potentially much longer access delays associated with the main memory access.
Caching operations become more complicated in multiprocessor environments, where two or more microprocessors share memory space. In such environments, two or more microprocessors may cache the same data from main memory. That is, multiple microprocessors each may cache a copy of the same data item from main memory. To ensure that each microprocessor accesses the most recently updated value of a given data item, some method of synchronizing the caches among the microprocessors must be used. Cache synchronization maintains cache “coherency” by providing some mechanism to prevent the individual microprocessors from using a data item whose value has become outdated through the operations of the other microprocessors. Cache synchronization can be managed either by hardware-enforced coherency or by software through cache management instructions.
One type of hardware enforced cache coherency is a “broadcast” type approach. Broadcast based approaches to cache synchronization generally rely on each microprocessor transmitting messages related to data memory operations. In turn, the individual microprocessors, or their cache controllers, monitor (“snoop”) those messages to determine whether the actions of another microprocessor have invalidated any data items held in their associated caches.
The use of these so-called “snoopy” buses thus represents a relatively straightforward and effective method of maintaining cache coherency in multiprocessor systems. However, snoopy buses can reduce the effective access bandwidth of cache memory, because the snoop traffic accesses to a given cache typically are supported on the same “port” or access bus that is used for locally generated cache accesses by the microprocessor(s). The amount of snoop traffic increases significantly as the microprocessor count increases and, eventually, the amount of snoop traffic can significantly limit overall system performance.
Various workarounds to the cache access interference problems posed by high volumes of snoop traffic include the use of multi-ported cache memory, where snoop traffic and locally generated traffic access the cache on different ports. However, such configurations can significantly increase the size, power consumption, and expense of the cache