1. Field of the Invention
The invention relates to multiprocessing systems, and more specifically to fault tolerant distributed multiprocessing systems.
2. Description of the Related Art
In multiprocessing systems, work is shared between several processors, which simultaneously execute part of the work. It is thus necessary for all processors to communicate, e.g. for sharing work and reporting as to the work carried out. Multiprocessing systems are notably used for providing fault tolerance, that is allow the system to continue operating despite hardware failures.
There are basically two types of multiprocessing systems, that is symmetric multiprocessing systems (SMP systems) and distributed multiprocessing systems (DMP systems). In symmetric multiprocessing systems, several processors are provided in the same machine: they share the same memory devices and the same I/O devices. Since all processors thus work in the same environment, or at least share the same view of their environment, the operating system common to all processors shares work between the processor. In SMP systems, shared memory is a way to rapidly exchange data between instances; sharing memory in these systems is a relatively easy task, since all processors work in the same environment—and thus see the same memory devices. All processors being in the same machine have a fast access to the memory devices; in this context, fast access is representative of the speed at which the processor may access local memory, e.g. over a bus; current communication latency between a processor and its memory reaches tens of nanoseconds. The problem with SMP is that making the machine fault tolerant is difficult: duplicating the machine is not a satisfactory solution in terms of costs, if one of the machines is supposed to be on standby while the other one is active. Another problem is that making a SMP machine work satisfactorily is difficult when the number of processors is increased to more than 4 or 8 processors: it requires extensive hardware expertise to arbitrate between processors.
In distributed multiprocessing systems (or clustering), a number of separate machines or hosts are connected through a local area network or another kind of network. This makes it much easier to make the system fault tolerant—since the failure of one machine does not have any direct consequence on the hardware of another machine. The problem in DMP systems is to communicate between processors; indeed, the processors communicate over the network: latency of communication between processors is several orders of magnitude higher than in a SMP system; indeed, the speed of communication over a local area network is several orders of magnitude higher than the speed at which a processor may access memory located within the same machine.
U.S. Pat. No. 4,590,551 discusses a dual-ported memory for a data communication network support processor. A dual set of memory control circuit cards is provided, one of the cards servicing a master processor, while the other one services a slave processor. Each memory control circuit card provides a local memory for its processor, and further provides local access logic circuitry for allowing the processor to access an external shared memory. This system provides a means for a master processor and a slave processor to share memory; however, the master and slave processors do not form a distributed multiprocessing system.