When highly available multiprocessor systems experience memory errors, the contamination caused by those errors can be contained by dividing and partitioning nodes into error containment cluster nodes ("ECCNs"). As disclosed in co-pending, commonly assigned U.S. patent application "ERROR CONTAINMENT CLUSTER OF NODES," Ser. No. 08/720,368, filed Sept. 27, 1996, now U.S. Pat. No. 5,845,071, each ECCN is predefined as a discrete group of nodes. Each node within each ECCN is further defined to have protected and unprotected memory. Processors on nodes within each ECCN may write to and access any memory within their own ECCN, but may only write to and access the unprotected regions in nodes within other ECCNs. In this way, contamination caused by an error is limited just to the local ECCN and unprotected memory regions in remote ECCNs. Once such an error is detected, it is then possible to selectively purge and re-initialize just nodes and parts of nodes that have become contaminated.
Highly available systems of the current art also advantageously operate in a network cache environment. That is, each node has its own network memory cache for faster memory retrieval of frequently-referenced data. Data from any remote node may be taken from that remote node's main memory and encached in the local node's network cache. Processors requesting data not found on local node memory may then check the local network cache before issuing a request to a remote node for a memory access. If the data required by the processor happens to be in the local network cache, the data is then immediately available to the processor. This obviates the processor having to issue a remote memory access request, and so it can complete its task more quickly. Also, the processing overhead of issuing and satisfying a remote memory access request is saved.
A problem arises, however, when the network cache operating environment is implemented in a system using ECCN partitioning. Since under traditional network cache principles, all nodes have a network cache into which any other remote node can encache data, it follows that the ability of ECCN partitioning to limit error contamination in such an environment is severely compromised. The universally shared network caches prevent complete isolation of the protected regions within ECCNs.
There is therefore a need in the art for a system employing a network cache environment that can maintain the error containment advantages of ECCN partitioning. Under such a system, processing advantages of improved speed and overhead economy would still be enabled through availability of network cache, while error containment advantages would also be available in the event of a memory error.