1. Field of the Invention
The present invention relates to efficient storage of memory line states in shared memory multiprocessor systems using directory based cache coherence.
2. Description of the Related Art
Processors use on chip or off chip cache memories to speed up accesses to system memory. In a shared memory multiprocessor system, more than one processor can store a copy of the same memory location (or line) in their respective cache memories. There has to be a cache coherence mechanism to maintain consistency among the multiple cached copies of the same memory line. In small, bus based multiprocessor systems, the coherence mechanism is usually implemented as a part of the cache controller using a snoopy coherence protocol. The well-known snoopy protocol cannot be used in large systems that are connected through an interconnection network due to the lack of a bus. As a result, these systems use a directory based protocol to maintain cache coherence.
The directories within each node of the network are associated with the main memory and keep the state information on the memory lines such as which cache has a copy of the line or whether the line has been modified in a cache and so on. Conventionally, these directories are organized as xe2x80x9cfull mapxe2x80x9d directories where the state information of every single memory line is stored by mapping each memory line to a unique location in the directory maintained by each node. The full map scheme assigns space for state information of each memory line whether it is cached in another node or not. The drawbacks of this scheme are the large area occupied by the directory and the increase in directory size with increase in memory size.
To solve this problem, xe2x80x9csparse directoriesxe2x80x9d have been proposed which have the capability to store the state of a limited number of memory lines. The drawbacks of the sparse directory approach are the performance loss due to the forced invalidation of lines from the processors"" caches when they run out of directory space and a relatively complex coherence protocol. Recently, a complete and concise remote (CCR) directory scheme has been proposed (U.S. Pat. No. 6,338,123, incorporated herein by reference) where the directory keeps state information only on the memory lines that are currently cached in a remote node. This scheme has the advantage that its size is proportional to the size of the caches in the system and it does not have to force any invalidations. However, it is desired to decrease the size of the directory even further (as the directory size is proportional to the number of memory lines cached in the system and the cache size of contemporary systems is growing) and to prevent the directory from having to grow linearly with an increase in the number of nodes.
It is, therefore, an object of the present invention to provide a method for maintaining coherence of memory lines in a shared memory multiprocessor system that includes a system area network and a plurality of compute nodes connected to the system area network. FIG. 4 represents one example of such a system. Each of the compute nodes includes a main memory, a shared cache, a coherence controller, and a directory. The invention, sometimes referred to herein as the xe2x80x9cdynamic CCR/sparse directory implementation,xe2x80x9d includes a method of: maintaining state information of the main memory cached in the shared caches of the other compute nodes; organizing a cache directory so that the state information can be stored in a first area efficient CCR directory format; switching to a second sparse directory format if the entry is shared by more than one other compute node; and dynamically switching between formats so as to maximize the number of entries stored in the directory.
This inventive directory mechanism is structurally similar to the CCR directory, yet maintains sparse directory characteristics of having a fraction of the total number of lines present in the external shared caches. Each entry in this structure could store one line in a sparse format or multiple lines in a CCR format. The invention first attempts to store entries in a CCR format as much as possible so as to maximize the number of lines stored. As nodes start to share the line, the format dynamically switches to that of a sparse implementation.
In addition, the shared memory multiprocessor system is a system area network and plurality of compute nodes connected to the system area network, each of the compute nodes includes a main memory, a shared cache connected to the main memory, a CCR/sparse shadow directory connected to the shared cache, and a coherence controller connected to the CCR/sparse shadow directory. The directory is adapted to store entries, representing lines in the shared cache, in a first format or a second format. The first format represents a single node""s sharing of a single line in the shared cache and the second format represents a plurality of nodes"" usage of a single line in the shared cache. The first format has one identifier bit, tag bits, one presence bit, and one modified bit. The size of the CCR/sparse shadow directory is adapted to store the entries in the first format only if the memory lines referenced by the entries are shared by exactly one node. The second format has one identifier bit, tag bits, presence bits, and one modified bit. The CCR/sparse shadow directory is adapted to store the entries in the second format if the memory lines referenced by the entries are shared by more than one node and attempts to store the entries in the first format before storing the entries in the second format. The CCR/sparse shadow directory is limited in size and the CCR/sparse shadow directory is further adapted to evict items from the CCR/sparse shadow directory if insufficient space is available to store a new entry.
The number of lines stored with the invention far exceeds the number of lines stored in a conventional sparse directory implementation of comparable area in the best case and reverts to the same number of lines-as that of a sparse directory implementation of comparable area in the worst case. Many technical benchmarks have shown that single node invalidations heavily dominate multiple node invalidations, strengthening the possibility that this implementation would result in the storage of more lines compared to that of a sparse implementation. Hence, the invention reduces the forced invalidations compared to a conventional sparse directory implementation having similar area constraints.