Technological advances have made it possible to interconnect many low cost processors and memories to build powerful, cost effective computer systems. Distributing computation among the processors allows increases in performance because of improved parallel execution. The performance of such multiprocessor systems depends on many factors such as flow control mechanisms, scheduling, the interconnection scheme between the processors and the implementation of inter process communication.
Another gain due to distributed computation is added robustness against single processor failures. This additional robustness is effective only if the connection medium between the processors is extremely reliable compared to processor components. Given that this very high reliability is hard to achieve, the processors, mainly in fault tolerant machines, are connected to redundant transport mechanisms (duplicate bussing system or redundant shared memory), with special microcode functions that switch from one medium to another upon the detection of a failure on the first one.
Moreover, the trend in the industry today is to give this multi processing machine a single system image, which means that the application programmer is not aware of the distribution of computation on several processors, so he/she can concentrate on the development of his/her program independently from the underlying hardware structure. In this scheme, the single system image is totally handled by the lower layers of the software (operating system and drivers).
Finally, fault tolerant machines are getting more and more commonplace because there is a need in the computer market for permanent service (airline control, banking and so on). Many of these machines like the one described in French Patent 2 261 568 are architected as a set of processors each of which can be replaced by another upon detection of a failure. In such a case, a control unit saves information from which the back-up processor can replace the failing processor and execute its tasks.
However, one of the major problems left aside by the cited prior approach is the absence of a secure message passing mechanism between processors, enabling said processor to communicate, either for normal use as an inter processor communication device, or for ensuring that, from a fault-tolerance point of view, a backup processor be provided with the last consistent data states used by the failing processor, just before failure.