1. Field of the Invention
The invention is related to the field of multiple node processing systems and in particular to a synchronizer for each node operative to synchronize the operation of its own node with the operation of all of the other nodes in the processing system.
2. Description of the Prior Art
The earliest attempts to produce fault tolerant computer systems provided redundant computers in which each computer simultaneously executed every task required for the control operation. Voting circuits monitoring the outputs of the multiple computers determined a majority output which was assumed to be the correct output for the system. In this type of system, a faulty computer may or may not be detected and the faulty computer may or may not be turned off.
The redundant computer concept, although highly successful, is expensive because it requires multiple computers of equivalent capabilities. These systems require powerful computers because each computer has to perform every task required for the operation of the system. As an alternative, the master-slave concept was introduced in which the operation of several computers were controlled and coordinated by a master control. The master control designated which tasks were to be executed by the individual computers. This reduced the execution time of the control operation because all the computers were no longer requied to execute every task, and many of the tasks could be executed in parallel. In this type of system when a computer is detected as faulty, the master could remove it from active participation in the system by assigning the task that would normally have been assigned to the faulty computer to the other computers. The problem encountered in the master-slave concept is that the system is totally dependent upon the health of the master and if the master fails then the system fails. This defect may be rectified by using redundant master controls, however, the increased cost of redundant masters limits the applicability of these systems to situations where the user is willing to pay for the added reliability. Typical of such situations are the controls of nuclear power plants, space exploration and other situations where failure of the control system would endanger lives.
Recent improvements to the master-slave and redundant execution fault tolerant computer systems discussed above are exemplified in the October 1978 proceedings of the IEEE, Volume 66, No. 10, which is dedicated to fault tolerant computer systems. Of particular interest are the papers entitled "Pluribus: An Operational Fault Tolerant Microprocessor" by D. Katuski et al., Pages 1146-1159 and "SIFT: The Design and Analysis of a Fault Tolerant Computer for Aircraft Control" by J. H. Wensley et al., Pages 1240-1255. The SIFT system uses redundant execution of each system task and of the master control functions. The Pluribus system has a master copy of the most current information which can be lost if certain types of faults occur.
More recently a new fault tolerant multiple computer architecture has been disclosed by Whiteside et al, in U.S. Pat. No. 4,256,547, in which each of the individual task execution nodes has an application processor and an operations controller which functions as a master for its own node. These operations controllers, in coordination with each other through the exchange of data and other information by means of inter-node messages, select the task its own node's application processor will execute. The task selection by the individual operations controllers is made on a distributed basis such that the execution of each task required for the operation of the control system may be selected by more than one of the operations controller in a fault tolerant manner. In this system each node is assigned a subset of the tasks it is capable of selecting and executing and no node is required to execute every task. The operations controllers are individually capable of detecting faulty nodes and excluding them from participation in the system. A predecessor of the multiple computer system is described by C. J. Walter et al in their paper "MAFT: A Multicomputer Architecture for Fault-Tolerance in Real-Time Control Systems" published in the proceedings of the Real Time System Symposium, San Diego, Dec. 3-6, 1985.
The present invention is a synchronizer for each node in a fault tolerant multiple node processing system which is based on the synchronizer taught by Whiteside et al in U.S. Pat. No. 4,323,966 and the synchronizer describd in the paper by C. J. Walter et al cited above.