1. Field of the Invention
The present invention relates to efficient processing of memory requests in cache-based systems. More specifically, the present invention relates to improved processing speed of memory requests (or other coherence requests) in the coherence controller of shared memory multiprocessor servers or in the cache controller of uniprocessor systems.
2. Description of the Related Art
Conventional computer systems often include on-chip or off-chip cache memories which are used with processors to speed up accesses to system memory. In a shared memory multiprocessor system, more than one processor can store a copy of the same memory location(s) (or line(s)) in its cache memory. A cache coherence mechanism is required to maintain consistency among the multiple cached copies of the same memory line. Furthermore, a network protocol such as a Sealable Coherent Interface (SCI) is often used in conjunction with the conventional systems.
In small, bus-based multiprocessor systems, the coherence mechanism is usually implemented as a part of the cache controllers using a snoopy coherence protocol. The snoopy protocol cannot be used in large systems that are connected through an interconnection network due to the lack of a bus. As a result, these systems use a directory-based protocol to maintain cache coherence. The directories are associated with the main memory and maintain the state information of the various caches on the memory lines. This state information includes data indicating which cache(s) has a copy of the line or whether the line has been modified in a cache(s).
Conventionally, these directories are organized as xe2x80x9cfull mapxe2x80x9d memory directories where the state information on every single memory line is stored by mapping each memory line to a unique location in the directory. FIG. 1 is a representation of a xe2x80x9cfull mapxe2x80x9d arrangement. A memory directory 100 is provided for main memory 120. In this implementation, entries 140 of the main directory 100 include state information for each memory line 160 of main memory 120. That is, there is a one to one (state) mapping between a main memory line 160 and a memory directory entry 140 (i.e., there is full mapping).
As a result, when the size of main memory 120 increases, the memory directory 100 size also increases. If the memory directory 100 is implemented as relatively fast static RAM, tracking the size of main memory 120 becomes prohibitively expensive. If the memory directory 100 is implemented using slow static RAMs or DRAMs, higher cost is avoided. However, a penalty is incurred in overall system performance due to the slower static RAM or DRAM chips. In fact, each directory access in such implementations will take approximately 5-20 controller cycles to complete.
In order to address this problem, xe2x80x9csparsexe2x80x9d memory directories have been conventionally used in place of the (xe2x80x9cfull mapxe2x80x9d) memory directories. FIG. 2 is a representation of a sparse directory arrangement. A sparse directory 200 is smaller in size than the memory director 100 of FIG. 1 and is organized as a subset of the memory directory 100. The sparse directory 200 includes state information entries 240 for only a subset of the memory lines 260 of main memory 220. That is, multiple memory lines are mapped to a location in the sparse directory 200. Thus, due to its smaller size, a sparse directory 200 can be implemented in an economical fashion using fast static RAMs.
However, when there is contention among memory lines 260 for the same sparse directory entry field 240, the state information of one of the lines 260 must be replaced. There is no backup state information in the sparse directory arrangement. Therefore, when a line 260 is replaced from the sparse directory 200, all the caches in the overall system having a copy of that line must be asked to invalidate their copies. This incomplete directory information leads to both coherence protocol complexity and performance loss.
Thus, there is a need for a system which improves coherence/caching efficiency without adversely affecting overall system performance and maintains a relatively simple coherence protocol environment.
It is, therefore, an object of the present invention to provide a structure and method for a system for maintaining coherence of cache lines in a shared memory multiplexor system comprising a system area network and a plurality of compute nodes connected to the system area network. Each of the compute nodes includes a local main memory, a local shared cache and a local coherence controller. Compute nodes external to a given compute node are defined as xe2x80x9cexternalxe2x80x9d shared caches. The coherence controller includes shadow directories, each corresponding to one of the external shared caches. Each of the shadow directories includes state information of the local main memory cached in the external shared caches.
The shadow directories include only state information of the local main memory cached in the external shared caches. Each of the shadow directories includes a plurality of sets, each of the sets includes a plurality of entries and each of the entries is a memory address of the local main memory. Furthermore, each entry includes tag bits and state bits such as a presence bit and a modified bit. The presence bit indicates whether a line of the local main memory is stored in an external shared cache and the modified bit indicates whether the line of the local main memory is modified in the external cache.
By keeping information on the exact number of remotely cached lines, the CCR directory provides a dynamic fall map directory of presently shared lines, but only uses the memory of a sparse directory. Consequently, the CCR directory has all the advantages of a fall map directory. In contrast, a conventional sparse directory keeps the state information only on a subset of the memory lines that could have been remotely cached in a full map directory scheme, which leads to inferior performance and a more complex protocol when compared to a conventional full map directory.