1. Field of the Invention
The present invention relates generally to a system and method for cache coherency, and more particularly to a system and method for cache coherency using flexible directory bit vectors.
2. Related Art
A distributed computer system typically includes a plurality of processing nodes each having one or more processors, a cache connected to each processor, and main memory that can be accessed by any of the processors. The main memory is physically distributed among the processing nodes. In other words, each processing node includes a portion of the main memory. At any time, data elements stored in a particular main memory portion can also be stored in any of the caches existing in any of the processing nodes.
A cache coherency mechanism is conventionally utilized to maintain the coherency of data stored in the main memory portions and the caches. FIG. 1 illustrates a directory based cache coherency mechanism where a directory 106 includes an entry 108 for each memory block 104 in a portion of main memory 102 (a directory 106 exists in each processing node). The entries 108 identify the processing nodes where the associated memory blocks 104 are cached. A number of conventional approaches for achieving directory based cache coherency exist.
FIG. 2 illustrates a first conventional directory based cache coherency mechanism, in which each entry 108 includes a bit vector field 202 and a state field 206. The bit vector field 202 includes N bits, where each bit is associated with a processing node. If the memory block 104 associated with the entry 108 is cached in processing nodes A, B, and D, for example, then the bits in the bit vector field 202 corresponding to processing nodes A, B, and D are set (i.e., are equal to logical "1"). All other bits in the bit vector field 202 are not set (i.e., are equal to logical "0"). The state field 206 includes information that identifies the state of the associated memory block 104 (i.e., whether the memory block 104 is uncached, cached exclusively in one cache, cached non-exclusively by multiple caches, etc.).
This first convention approach is non-ideal, however, because it places a ceiling on the number of processing nodes that can be in the computer system. Specifically, according to this approach, the computer system can include at most N processing nodes. More processing nodes can be accommodated by increasing the size of the bit vector field 202 in each entry 108. This is not a satisfactory solution, however, since it increases storage overhead and ultimately degrades system performance and limits system size (i.e., the first approach is limited to a certain system size).
FIG. 3 illustrates a second conventional directory based cache coherency mechanism, in which each entry 108 includes a finite number of pointer fields 302 (in this case, three pointer fields 302) and a state field 304. The pointer fields 302 store pointers to processing nodes in which the associated memory block 104 is cached. The state field 304 includes information that identifies the state of the associated memory block 104 (i.e., whether the memory block 104 is uncached, cached exclusively in one cache, cached non-exclusively by multiple caches, etc.).
This second approach does not limit cacheability. If a memory block 104 is cached in more processing nodes than the number of pointers, then the system assumes that all of the nodes are caching the block 104. Thus, when some node wishes exclusive access to the block 104, the system invalidates the copies of the block 104 in all of the nodes. Thus, this second convention approach is non-ideal since it results in many invalidates.
The entry format shown in FIG. 2 can also be used to support a third conventional directory based cache coherency mechanism. According to this third approach, each bit of the bit vector 202 is associated with a group of processing nodes. For example, if the memory block 104 associated with an entry 108 is cached in processing nodes A, B, and J, and processing nodes A and B are in Group 1 and processing node J is in Group 2, then the bits in the bit vector field 202 corresponding to Groups 1 and 2 are set, and all other bits in the bit vector field 202 are not set. The state field 206 includes information that identifies the state of the associated memory block 104 (i.e., whether the memory block 104 is uncached, cached exclusively in one cache, cached non-exclusively by multiple caches, etc.).
This third conventional approach is non-ideal, however, because its representation of the caching state is very imprecise. This imprecision results in degrading system performance. Suppose, in the above example, that each group contains eight processing nodes, such that processing nodes A-H are in Group 1, and processing nodes I-P are in Group 2. Suppose again that the memory block 104 associated with an entry 108 is cached in processing nodes A, B, and J, such that the bits in the bit vector field 202 corresponding to Groups 1 and 2 are set. Now suppose that processing node A has been granted exclusive access to the memory block 104. In this case, an invalidate message must be sent to all of the processing nodes in Groups 1 and 2 (other than processing node A), even through the memory block 104 is only cached in processing nodes A, B, and I. Accordingly, the third conventional approach wastes valuable communication bandwidth, thereby degrading system performance. Note that this problem exists, even when the third approach is used in small computer systems.
Thus, what is required is an improved cache coherency mechanism in a computer system that results in minimal if any system performance degradation, and that requires minimal if any directory storage overhead.