1. Field of the Invention
The present invention relates to a method of organizing and programming distributed computer systems in which processors are connected via interconnection or communication networks, such that multiple of hardware and software faults occurring in both the processors and networks do not lead to the failures of application computations.
2. Description of the Prior Art
An approach for uniform treatment of hardware and software faults in distributed computer system was described in an article by K. H. Kim and H. 0. Welch, entitled "Distributed Execution of Recovery Blocks: An Approach for Uniform Treatment of Hardware and Software Faults in Real-Time Applications," IEEE Transactions on Computers, Vol 38, No. 5May 1989 [*Kim 89]. This paper described the distributed recovery block scheme which was based on a combination of both the distributed redundant processing concept and the recovery block structuring method. Recovery block is a language construct which was developed in J. J. Horning, H. C. Lauer, P. M. Melliar-Smith, and B. Randell, and was described in an article entitled "A Program structure for Error Detection and Recovery," Lecture Notes in Computer Science, Vol. 16, New York: Springer-Verlag, 1974, pp. 171-187. It supports incorporation of program redundancy into a fault-tolerant program in a concise and easily readable form.
The syntax of recovery block is as follows: ensure T by B1 else by B2 . . . else by Bn else error. Here, T denotes the acceptance test (AT), B1 is the primary version of a program module, and Bk, where 1&lt;k&lt;m, the backup versions. All of the versions of the program module are designed to produce the same or similar computational results. The acceptance test is a logical expression representing the criterion for determining the acceptability of the execution results of the versions. An execution of a version of the program module is thus always followed by an acceptance test. For the sake of simplicity in exposing the basic principles, the distributed recovery block scheme was described only for the cases where a recovery block contains two versions of a program module.
In the distributed recovery block scheme, a recovery block was duplicated into two processor nodes which together form a distributed recovery block computing station. One of the nodes in the distributed recovery block station functioned as the primary node at any given time while the other functioned as the shadow node. The primary node used the primary version in the recovery block as its first version for processing a new input data item. The shadow node used the backup version as the first version for processing a new input data item. After receiving the common input data item from the predecessor computing station, the two nodes then proceeded concurrently to execute their respective two different versions first, followed by an application of the same acceptance test to the results.
In a fault-free situation, both nodes will pass the acceptance test with the results computed with their first used versions. In such a case, the primary node notifies the shadow of its success in the acceptance test. Thereafter, only the primary node sends its output to the successor computing stations.
However, if the primary node fails its test while the shadow node passes its test, the shadow node will take over the role of the primary as soon as it receives notice that the primary node has failed. If the primary node is completely lost, i.e., crashes, such that it is unable to notify the shadow node of the failure of its test, the shadow node will recognize the failure of the primary upon the expiration of a present time limit.
In this scheme, a status exchange mechanism exists between the primary node and the shadow node. Such a status exchange mechanism is necessary in order to detect the failure of the partner node as well as to minimize the frequency of both nodes sending their computation results to the successor computing stations.
Moreover, given the above scheme as a starting point, systems utilizing the scheme require increasingly more complicated types of status exchange mechanisms as the number of shadow nodes forming a distributed recovery block station increases beyond two. As a result, it is highly expensive to construct a distributed recovery block computing station which redundantly executes a recovery block containing more than two versions by use of more than two processor nodes.
In addition, and even where only two nodes are present, in the case where the primary node crashes, the shadow node can take over the role of the primary node only after the expiration of the time limit is detected. Therefore, recovery time is much greater in this case than in the case where the primary node merely fails in its acceptance test and is still capable of sending a notice to the shadow node.
Another approach for dealing with hardware faults in distributed computer system was described in an article by K. Mori et al., entitled "Autonomous Decentralized Software Structure and Its Application," Proc. of FJCC 1986 [Mori 86*]. In this method when there exist a plurality of processors for executing the same program module, all of the processors send their execution results to the transmission network without any selective coordination among themselves. As a result, each consumer processor on the receiving side must recognize redundancy among multiple received messages. Each consumer node then selects an acceptable one of the messages which have come through the network from different producer nodes which have executed the same program module.
With this method, the messages produced from multiple parallel executions of the same program module are collected during a fixed period of time. After the expiration of the fixed time period, an acceptable result value is selected by application of the majority decision logic in the case where the producer computing station uses more than three processors executing the same program module, to assure a high degree of data integrity. However, in the case where the producer station uses two processors for execution of the same program module, a value cannot be selected which can be expected to have a high degree of confidence in its data integrity. A mismatch of the contents of the two messages produced from dual parallel executions of the same program is inherently unreliable. In this conventional method, the same program module is executed in multiple processors. Therefore, this conventional method is only effective for hardware faults and not effective for software faults.