1. Field of the Invention
The present invention relates to fault isolated, distributed shared memory, multi-processor systems and environments.
2. Related Art
In distributed shared memory (DSM) systems, physical memory is distributed among a plurality of processing nodes. The distributed memory is addressable as a single block of memory. Generally, processors within any of the processing nodes can access physical memory on any other node.
One problem faced by DSM systems is fault containment. DSM systems can include tens, hundreds and, in some instances, even thousands of processors plus many other components. Larger systems have more components that can fail and, as a result, failures tend to occur more frequently. When a fault occurs in a process, a processing task, a thread, a processor, physical memory or in any other part of a processing node, the node and any other nodes that may have accessed data from the failed node have to be reset. Conventional systems do not track users of data. Thus, a failure in any part of a system can bring the whole system down. It is undesirable to take a whole system down each time there is a failure, especially in larger DSM systems where failures tend to occur more frequently. In order to prevent total system failure every time there is a fault, faults must be isolated.
In order to isolate failures, fire-walls can be constructed between portions, or cells, of the system. A separate kernel runs on each cell. The fire-walls prevent sharing of memory between cells and prevent the separate kernels from sharing data so that failures in one cell do not contaminate other cells.
However, without shared memory, processes or threads that run on different cells must communicate with messages or something similar. Thus, strict fire-walls hurt performance and require SMP applications to be re-written in order to operate on a fault-isolated DSM.
What is needed is a system, method and computer program product for selectively opening holes in fire-walls to allow pages of memory to be shared in a controlled fashion.