1. Field of the Invention
Embodiments of the present invention relate to techniques for improving the performance of computer systems. More specifically, embodiments of the present invention relate to cache-coherency protocols for multi-processor systems.
2. Related Art
Modern multiprocessor systems include a number of processors which are coupled to a memory hierarchy. The memory hierarchy typically includes one or more levels of cache memories and a main memory which are shared between the processors. In such systems, shared cache lines can be accessed by any of the processors, and the systems use a cache-coherency mechanism to control access to the shared cache lines. These cache-coherency mechanisms typically enforce a cache-coherency protocol that dictates the ways in which processors in the system can access cache lines. For example, one common cache-coherency protocol is the MESI protocol, which provides four possible states in which cache lines in the system can be held: modified (M), exclusive (E), shared (S), and invalid (I).
Some multiprocessor systems use a directory-based cache-coherency mechanism to maintain cache lines in a coherent state. In such systems, a directory keeps track of status information for cache lines in the system. For example, the directory can keep track of which processors in the system have a shared copy of a given cache line. Unfortunately, the amount of circuitry required to implement such a directory increases as the number of sharers (i.e., processors) in the system increases. This increased circuitry requires more semiconductor area and consumes more power.
To avoid increasing the size of the directory as the number of sharers increases, some designers have proposed techniques that reduce the amount of information stored in the directory. One such technique uses a coarse bit-mask for a shared cache line to keep an approximate record of the identities of processors that have a shared copy of the cache line. In this approach, each bit in the bit-mask represents two or more processors that can have a shared copy of the cache line. Unfortunately, the coarse bit-mask approach has scalability problems. More specifically, because each bit represents a number of sharers, when a sharer needs to be invalidated, all other sharers that are identified using the same bit must also be invalidated.