Increasingly, state-of-the-art computer applications implement high-end tasks that require multiple processors for efficient execution. Multiprocessor systems allow parallel execution of multiple tasks on two or more central processor units ("CPUs"). A typical multiprocessor system may be, for example, a network server. Preferably, a multiprocessor system is built using widely available commodity components, such as the Intel Pentium.RTM. Pro processor (also called the "Pentium.RTM. Pro" processor), PCI I/O chipsets, P6 bus topology, and standard memory modules, such as SIMMs and DIMMs. There are numerous well-known multiprocessor system architectures, including symmetrical multiprocessing ("SMP"), non-uniform memory access ("NUMA"), cache-coherent NUMA ("CC-NUMA"), clustered computing, and massively parallel processing ("MPP").
A symmetrical multiprocessing ("SMP") system contains two or more identical processors that independently process as "peers" (i.e., no master/slave processing). Each of the processors (or CPUs) in an SMP system has equal access to the resources of the system, including memory access. A NUMA system contains two or more equal processors that have unequal access to memory. NUMA encompasses several different architectures that can be grouped together because of their non-uniform memory access latency, including replicated memory cluster ("RMC"), MPP, and CC-NUMA. In a NUMA system, memory is usually divided into local memories, which are placed close to processors, and remote memories, which are not close to a processor or processor cluster. Shared memories may be allocated into one of the local memories or distributed between two or more local memories. In a CC-NUMA system, multiple processors in a single node share a single memory and cache coherency is maintained using hardware techniques. Unlike an SMP node, however, a CC-NUMA system uses a directory-based coherency scheme, rather than a snoopy bus, to maintain coherency across all of the processors. RMC and MPP have multiple nodes or clusters and maintain coherency through software techniques. RMC and MPP may be described as NUMA architectures because of the unequal memory latencies associated with software coherency between nodes.
All of the above-described multiprocessor architectures require some type of cache coherence apparatus, whether implemented in hardware or in software. High speed CPUs, such as the Pentium.RTM. Pro processor, utilize an internal cache and, typically, an external cache to maximize the CPU speed. Because a SMP system usually operates only one copy of the operating system, the interoperation of the CPUs and memory must maintain data coherency. In this context, coherency means that, at any one time, there is but a single valid value for each datum. It is therefore necessary to maintain coherency between the CPU caches and main memory.
One popular coherency technique uses a "snoopy bus." Each processor maintains its own local cache and "snoops" on the bus to look for read and write operations between other processors and main memory that may affect the contents of its own cache. If a first processor attempts to access a datum in main memory that a second processor has modified and is holding in its cache, the second processor will interrupt the memory access of the first processor and write the contents of its cache into memory. Then, all other snooping processors on the bus, including the first processor, will see the write operation occur on the bus and update their cache state information to maintain coherency.
Another popular coherency technique is "directory-based cache coherency." Directory-based caching keeps a record of the state and location of every block of data in main memory. For every shareable memory address line, there is a presence bit for each coherent processor cache in the system. Whenever a processor requests a line of data from memory for its cache, the presence bit for that cache in that memory line is set. Whenever one of the processors attempts to write to that memory line, the presence bits are used to invalidate the cache lines of all the caches that previously used that memory line. All of the presence bits for the memory line are then reset and the specific presence bit is set for the processor that is writing to the memory line. Therefore, the processors do not have to reside on the snoop bus because the directory maintains coherency for the individual processors.
Directory-based coherency schemes that have a directory entry for every cache line in main memory can become prohibitively large. For example, a 1 Gbyte main memory may typically comprise 33,554,432 memory lines or blocks, where each line contains 32 bytes of data (equivalent to a cache line in Pentium.RTM. Pro processors). A corresponding "full" directory contains a memory line status table ("MLST") that has 33,554,432 entries, where each directory entry in the MLST contains several state bits. The state bits are typically MESI-type bits that indicate whether a cache line has been modified by a CPU, and whether a cache line is shared by two or more CPUs or is exclusively controlled by a single CPU.
For example, if the 1 Gbyte directory described above stored four (4) state bits per entry, then sixteen (16) megabytes of RAM are need to store the entire MLST. The RAM requirements are even higher if ECC bits are also stored in the MLST. The full directory becomes prohibitively expensive if it is implemented using SRAM.
U.S. patent application Ser. No. 08/762,636, incorporated by reference above, discloses a limited-sized directory which caches state bits for only a subset of the 32-byte blocks from main memory in a direct-mapped cache using well-known caching techniques. Entries in the limited directory are accessed by submitting the same address used to access main memory. The N most significant address bits are stored as "tag" bits in a tag array (or tag field) in the limited directory. The corresponding state bits are stored in a state array (or state field) and ECC bits may also be stored. The M next most significant address bits of the current address are used as an index to point to specific directory entries. If the N most significant bits stored in the tag array match the N most significant bits of the current address, a "HIT" has occurred. If the bits do not match, a "MISS" has occurred and a replacement coherency transaction is executed to update the entry in the limited directory.
The invention disclosed in U.S. patent application Ser. No. 08/762,636 takes advantage of the fact that rarely is all, or even a large portion, of main memory being cached at any given time by the CPUs in the multiprocessor system. Hence, a coherency directory may be implemented as a direct-mapped cache that uses a much smaller amount of very fast SRAM. The limited directory can store state bits, tag bits and ECC bits for a much smaller subset of the 32-byte blocks in main memory without incurring a significant performance penalty due to cache misses.
It is well-known that data in caches, including SRAM caches, can become corrupted. In the case of a multi-processor system implementing a limited directory cache, the system may stall (or "hang") due to the Pentium.RTM. Pro bus protocol when coherency has not been maintained. This stall will prevent the system software from logging errors and shutting the system down in a controlled manner, thereby also preventing a more efficient recovery.
Therefore, there is a need in the art for improved multiprocessor systems that implement more fault-tolerant directory-based coherency algorithms. In particular, there is a need in the art for directory-based coherency systems and methods that ensure reliable system shutdown after the detection of fatal errors that are caused by, or may result in, a corrupted coherency directory. There is a still further need, in a multiprocessor system implementing directory-based coherency, for improved systems and methods that dynamically change the directory-based coherency algorithms after detection of coherency corruption in order to controllably shut down the multiprocessor system.