1. Field of the Invention
The present invention relates to computer systems employing a cache coherency directory and, more specifically, to a system for increasing the number of associativity classes in a cache directory.
2. Description of the Prior Art
A cache is a collection of data that stores, in a relatively fast memory system, duplicates of data stored elsewhere in a relatively slower memory system. Frequently accessed data can be stored for rapid access in a cache. During processing of the data, cached data can be accessed rather than the original data. Once the cached data has not been accessed for a given amount of time, the cached data is written back to its original memory location and room in the cache is made for new data. Processing speed can be improved significantly through use of a cache.
Use of a cache presents a challenge in multiprocessor systems. This is because each processor may use its own cache, but all of the processors may share the same main memory. In this case, if two different processors access the same data, but operate on it in the own caches, then the data can become incoherent. Therefore, a cache coherency directory is often used to maintain the coherency of the caches in a multiprocessor system. A cache coherency directory records the addresses and the status of each cache line in a system.
To operate a cache coherency directory effectively, the system must employ a cache coherency protocol. One example of a cache coherency protocol, MESI (Modified—Exclusive-Shared—Invalid), supports efficient maintenance of a cache. In the protocol, each cache line is assigned one of four states, including: Modified, in which the cache line is present only in the current cache, but has been modified from the corresponding value in main memory. The cache must write the currently-stored data back to main memory before any other read of the corresponding main memory location; Exclusive, in which the cache line currently matches main memory; Shared, in which the cache line may be stored in other caches of the machine; and Invalid, in which the cache line is invalid.
For example, in a computer system with four processor busses and one processor socket per bus. Each processor socket most likely contains one or more levels (L1/L2) of on-die cache. The four processor bus segments are connected to a northbridge capable of satisfying memory and I/O requests as well as tasked with maintaining cache coherency amongst the bus segments. Several methods are known for maintaining coherency in a multiple processor bus system. One approach is to broadcast all snoops on the other processor bus segments. A second solution utilizes a coherence directory (or snoop filter) in the northbridge to track cache lines as they are requested by the processors. A coherency directory's usefulness increases as the number of processor bus segments grows. For example, broadcast snoop traffic in a four bus system reduces the usable bus bandwidth to only 25% of the theoretical peak.
A coherency directory eliminates (filters) snoops to busses known not to contain the requested cache line. Maximizing the coherence directory's tracking capability results in a higher hit rate and therefore better performance.
Sectoring is one common method to increase coverage of the coherence directory. A typical sectoring approach would be one address tag for two adjacent cache lines. For each address tag, there are two cache (MESI) states, one for each cache line. The number of associativity classes supported by the cache directory is limited by the width (number of bits) of the physical storage array (i.e. eDRAM, SRAM) and the information stored per class within the array. One portion of the class information is the address tag field. The address tag within each associativity class must contain enough bits to identify all useable system memory locations uniquely. Taken to an extreme, the maximum system memory capacity dictates the size of the address tag field required. However, even though a system has a maximum memory capacity, the actual physical memory installed may be much less. Several reasons may explain why the maximum memory capacity is not achieved, for example the memory technology required to realize maximum capacity may not yet be available, or if available, is too expensive. Also, the user might not require the maximum memory capacity for a particular application. In such cases, the most significant bits of the address tag field will never be used. Thus, the chip area consumed for these bits is unused and essentially wasted.
In a cache-coherent distributed memory (NUMA) computer system, total system memory is subdivided among various the nodes. For various reasons, such systems are often configured with gaps in the system address map. One motivation for doing this may be programming simplicity by allocating an equal portion of the total system address space to each node. Another reason may be to allow additional address space on each node for systems supporting hot memory add. For systems configured in this way, the amount of physical memory, such as dynamic random access memory (DRAM), may be significantly less than the span of system addresses. For a directory-based coherence protocol, system address gaps necessitate a larger address tag (number of bits) than if the system addresses were contiguous. As a result, address tag bits may go unused.
Generally, cache directory performance is enhanced in proportion to the number of associativity classes in the cache directory. When a system employs certain memory configurations (such as those with less memory than the maximum capacity for the system) each associativity class may have one or more unused higher order bits. Current systems do not employ such unused bits to create new associativity classes.
Therefore, there is a need for a system that employs unused tag bits from several associativity classes to create additional associativity classes.