1. Field of the Invention
The invention is related to the field of multiple node processing systems and in particular to an operations controller one for each node in the multiple node processor system, for each operations controller controlling the operation of its own node in a fault tolerant manner.
2. Description of the Prior Art
The earliest attempts to produce fault tolerant computer systems provided redundant computers in which each computer simultaneously executed every task required for the control operation. Voting circuits monitoring the outputs of the multiple computers determined a majority output which was assumed to be the correct output for the system. In this type of system, a faulty computer may or may not be detected and the faulty computer may or may not be turned off.
The redundant computer concept, although highly successful, is expensive because it requires multiple computers of equivalent capabilities. These systems require powerful computers because each computer has to perform every task required for the operation of the system. As an alternative, the master-slave concept was introduced in which the operation of several computers was controlled and coordinated by a master control. The master control designated which tasks were to be executed by the individual computers. This reduced the execution time of the control operation because all the computers were no longer required to execute every task, and many of the tasks could be executed in parallel. In this type of system when a computer is detected as faulty, the master could remove it from active participation in the system by assigning the task that would normally have been assigned to the faulty computer to the other computers. The problem encountered in the master-slave concept is that the system is totally dependent upon the health of the master and if the master fails then the system fails. This defect may be rectified by using redundant master controls, however, the increased cost of redundant masters limits the applicability of these systems to situations where the user is willing to pay for the added reliability. Typical of such situations are the controls of nuclear power plants, space exploration and other situations where failure of the control system would endanger lives.
Recent improvements to the master-slave and redundant execution fault tolerant computer systems discussed above are exemplified in the October 1978 proceedings of the IEEE, Volume 66, No. 10, which is dedicated to fault tolerant computer systems. Of particular interest are the papers entitled "Pluribus: An Operational Fault Tolerant Microprocessor" by D. Katuski et al., Pages 1146-1159 and "SIFT: The Design and Analysis of a Fault Tolerant Computer for Aircraft Control" by J. H. Wensley et al., Pages 1240-1255. The SIFT system uses redundant execution of each system task and of the master control functions. The Pluribus system has a master copy of the most current information which can be lost if certain types of faults occur.
More recently a new fault tolerant multiple computer architecture has been disclosed by Whiteside et al, in U.S. Pat. No. 4,356,546, in which each of the individual task execution nodes has an applications processor and an operations controller which functions as a master for its own node.
The present invention is an operations controller for a fault tolerant multiple node processing system based on the system taught by Whiteside et al in U.S. Pat. No. 4,323,966 which has improved fault tolerance and control capabilities. A predecessor of this operations controller has been described by C. J. Walter et al in their paper "MAFT: A Multicomputer Architecture for Fault-Tolerance in Real-Time Control Systems" published in the proceedings of the Real-Time System Symposium, San Diego, Dec. 3-6, 1985.