1. Field of the Invention
The present invention relates to a method and apparatus for checkpointing a plurality of processes operating under a distributed processing environment while carrying out communication between software processes.
2. Description of the Background Art
A conventional method for increasing reliability of program execution for computers is periodic checkpointing of each process. When a fault occurs, a state of the process can be rolled back to a checkpoint before the fault occurred, and the process can be re-executed from that checkpoint state. A checkpoint state is a state of a process during execution of a program, and is stored information necessary for re-execution from that checkpoint. A "checkpoint" refers to a time-point at which re-execution can be commenced when a fault occurs, and "to checkpoint" and "checkpointing" refer to the act of storing the checkpoint state.
In a system where a single process operates or one process operates independently of other processes, it suffices to checkpoint for intermediate states of that process alone. However, in a distributed system in which a plurality of processes operate in parallel while carrying out inter-process communications, it is insufficient to have checkpointed for only the single process where the fault occurred. Thus, it is necessary to checkpoint a plurality of processes that are related to each other by inter-process communications, so that these processes can be re-executed without contradiction.
Hereinafter, a checkpoint generated for an individual process is referred to as a "checkpoint," and a set of checkpoints corresponding to related processes is referred to as a "distributed checkpoint." Also, "fault process" refers to a process to be rolled back because a fault occurred in that process and "non-fault process" refers to a process that is not a "fault process."
For inter-process communications under a distributed environment, message passing and data exchange are available. For data exchange, either using shared memory or file sharing is available. Message passing is a way of exchanging data by synchronizing the message sending process and the message receiving process. Shared memory is a memory shared among a plurality of processes, which can be read or written directly by each process and any other process can view what is written in the shared memory. A file is accessible from a plurality of processes, so a file may facilitate an information exchange.
While message passing is a synchronous inter-process communication, shared memory or shared files are asynchronous inter-process communications. For both types of communications, it is necessary to generate a distributed checkpoint for the set of processes, i.e., for a "checkpoint group," that affect each other in performing inter-process communications.
FIGS. 1(a)-1(c) show examples of three types of distributed checkpoints CH1, CH2, and CH3, respectively, where processing is carried out while each one of three processes P1, P2, and P3, respectively, do message passing. Also, in FIGS. 1(a)-1(c), a symbol "m" indicates a message, and two numbers following "m" indicate an identification number of a message sending process and an identification number of a message receiving process, respectively.
In FIG. 1(a), at CH1, processes P1, P2, and P3 generate checkpoints ch11, ch12, ch13, respectively. Referring to message m32, at ch13, despite the fact that process P3 is in a state of not having sent message m32, process P2 is in a state of already having received message m32 at ch12. Consequently, if re-execution is performed after rollback to distributed checkpoint CH1 upon occurrence of a fault in any process, a contradictory state results for message m32. Similarly, referring to CH3 of FIG. 1(c), a contradictory state is produced for message m23.
In contrast, referring to CH2 of FIG. 1(b), no contradictory state arises for any message, so the rollback and the re-execution can be performed correctly.
There are two types of conventional methods for performing distributed checkpointing: 1) synchronous checkpointing and 2) asynchronous checkpointing. The conventional system can utilize only one of these methods. FIGS. 2(a) and 2(b) show synchronous and asynchronous checkpointing during processing, respectively, while messages are exchanged between three processes A, B, and C.
In a synchronous checkpointing system, checkpointing is performed by synchronizing among processes belonging to a checkpoint group. Specifically, a checkpoint such as CH1 in FIG. 2(a) is generated by producing a condition in which there is no inconsistency caused by inter-process communications among the processes. In a system described in K. M. Chandy and L. Lamport, "Distributed Snapshots: Determined Global States of Distributed Systems," ACM Trans. Computer Syst., Vol. 3, No. 1 (February 1985) at 63-75, messages that would create contradiction are detected by mutually sending messages called markers on distributed checkpoint acquisition. Storing these detected messages creates a consistent condition in order to perform synchronous checkpointing.
Apart from the system proposed by Chandy et al., J. S. Plank and K. Lee propose a scheme for performing local checkpointing by establishing synchronization using a two-phase commit protocol in "ickp: A Consistent Checkpointer for Multicomputers," IEEE Parallel Distrib. Technol. Syst. Appl., Vol. 2, No. 2 (Summer 1994) at 62-67. In this scheme, all related processes are stopped in the first phase and a checkpoint condition with no message-related state is produced for each process. Then, after checkpointing of all of the processes is finished, processing of all the processes is resumed in the second phase.
With the above synchronous-type checkpointing, all the processes are restarted from the checkpoint CH1 when a fault occurs in process B at time-point X in FIG. 2(a). FIG. 3 shows a client-server type processing in which processing requests are issued by a plurality of client processes, C1 and C2, to a server process, S, by message passing. After generating synchronous checkpoint CP1, each process continues processing with message communication. Under these circumstances, if rollback is performed to CP1 because a fault occurs at client C1 at time-point F1, server S rolls back since it was in communication with client C1. Furthermore, C2 must roll back to CP1, since it performed communication with server S. Because a plurality of client processes are usually used by different users in a client-server model system, a fault at a single client node of one user affects other client nodes of many other users.
In an asynchronous checkpointing system, as shown in FIG. 2(b), checkpointing is performed at an arbitrary time in each process. One method of implementing an asynchronous checkpointing system is disclosed in R. E. Strom and S. Yemeni, "Optimistic Recovery in Distributed Systems," ACM Trans. Computer Syst., Vol. 30, No. 3, at 204-228 (1985). In FIG. 2(b), if a fault occurs in process B at a time-point indicated by the symbol x, process B rolls back to CHb. Since process B must regenerate messages m5 and m6, processes A and C also roll back to CHa and CHc, respectively. When this happens, m4 has to be regenerated by process C, in turn requiring process B to further roll back to a checkpoint prior to CHb. Such a situation, in which processes are rolled back in a chain, is called "cascade rollback."
In an asynchronous checkpointing system, in order to prevent such a cascade rollback, a scheme called "message logging" is adopted, in which the received messages are stored in each process. Specifically, in FIG. 2(b), messages illustrated with squares are received messages that have been stored, while messages illustrated with triangles are received messages which have not yet been stored.
In FIG. 2(b), if a fault occurs in process B at a time-point indicated by the symbol X, process B restarts from CHb with message m5 stored, so re-execution is possible from the state prior to receipt of message m6. However, as the content of message m6 has been lost, process C is also re-executed from Chc. The step receiving m4 is re-executed using stored m4, and m6 is then transmitted. Execution continues without rollback of process A.
It should be noted that the operation of each process must be determinate, i.e., reproducible. That is, the process produces the same results every time it is executed. In contrast, an indeterminate process may produce a different result depending on repeated execution.
The reason why the process must be determinate is that each process re-executes the receipt of messages using the stored messages. If the operation of any process were indeterminate, there would be a possibility of a message different from the stored received message in the receiving process being generated by the message sending process.
Advantages/disadvantages of synchronous checkpointing and asynchronous checkpointing are listed below.
&lt;&lt;Synchronous checkpointing&gt;&gt;
Restart points can easily be ascertained(advantage). PA0 Operation of each process may be indeterminate (advantage). PA0 Message storing is not necessary (advantage). PA0 When a fault occurs, all the checkpoint groups are rolled back (disadvantage). PA0 Some of non-fault processes might not need to be rolled back (advantage). PA0 Message storing is necessary requiring greater memory capacity (disadvantage). PA0 If a message has not been stored, the number of processes to be rolled back and the ranges of rollback are increased (disadvantage). PA0 Operation of each process must be determinate, i.e., any indeterminate program cannot be used (disadvantage).
&lt;&lt;Asynchronous checkpointing&gt;&gt;