1. Field of the Invention
The present invention relates to checkpoint restart facilities installed in each of a number of data processors which are interconnected to form a distributed processing system.
2. Description of the Related Art
A distributed processing system is a system in which a plurality of data processors are interconnected by communication lines so that data transmission and reception can occur among the data processors. With such a system, the data processors can share data and execute distributed processing of an application program.
In a data processing system, checkpoint restart facilities are well known in which, when a system failure occurs in the system, processing is allowed to continue from the last checkpoint of a program which has normally been executed before the system failure occurs. The facilities save information (i.e., checkpoint data) necessary to restart the execution of a program from that point in the program execution at which the information is saved.
In the distributed processing system, each of the data processors has the checkpoint restart facilities and independently executes its checkpoint restart facilities to recover from a failure.
In the distributed processing system, however, with data transmission and reception performed between data processors, even if one of the data processors operates properly, a system failure may occur in another data processor. In such a case, the following problem will arise when each of the data processors independently performs the checkpoint restart facilities as described above. That is, with data transmitted from one data processor to another data processor, when a failure occurs in the former, it will execute the checkpoint restart facilities to continue processing from a point prior to occurrence of the failure. In this case, the latter which functions properly does not recognize that the restart facilities has been executed in the former and thus will not identify whether data transmitted from the former is data transmitted prior to occurrence of the failure or is fresh data. Therefore, a situation in which no data necessary for the current processing is transmitted to the latter may take place, and a malfunction may occur in data communication. This will lower the reliability of the distributed processing system.