1. Field of the Invention
The present invention relates generally to multiprocessor computer systems, and more particularly, to a system and method for partitioning to support high availability of system resources.
2. Related Art
A distributed computer system typically includes a plurality of processing nodes each having one or more processors, a cache connected to each processor, and main memory that can be accessed by any of the processors. The main memory is physically distributed among the processing nodes. In other words, each processing node includes a portion of the main memory. At any time, data elements stored in a particular main memory portion can also be stored in any of the caches existing in any of the processing nodes.
As multiprocessors become larger and larger, the probability of a hardware failure increases due to the increased number of components in the system. In addition, software and hardware failures can cause a large number of processors to be unavailable. It is desirable that the effects of software or hardware failure be contained to a subset of the entire system, thus allowing the multiprocessor to be more highly available.
Access rights have been used for years by software to protect resources. However, access rights have not been commonly used to protect hardware resources in an effort to improve hardware availability. The Stanford FLASH implemented a single bit in its memory directory to protect sections of memory (see, J. Kosjin et al., The Stanford FLASH Multiprocessor, Proceeding of the 21st International Symposium on Computer Architecture,pg. 302-313, April 1994). However, this scheme of protection only supported the concept of "us versus them." Only intra-partition communications is protected by such a scheme. Communications between two partitions cannot be protected because once inter-partition communication begins, a third partition can access and corrupt any resource associated with either of the two communicating partitions.
Thus, what is required is an improved partitioning system that results in minimal, if any, system performance degradation, and requires minimal directory storage overhead while supporting high availability of system resources.