1. Field of the Invention
This invention relates to reconstruction of the state of interrupted computer programs. More particularly this invention relates to user-level checkpointing and restart of a program having multiple processes.
2. Description of the Related Art
Checkpointing is the procedure of saving the state of a running program so that it may be subsequently reconstructed and restarted, possibly on a different computer. Checkpoints are typically saved on computer readable storage media, and may be created at various times during the lifetime of an executing program. This is especially useful for long-running programs, where the likelihood of hardware failure during program execution increases with the length of time that the program has been executing. In addition, checkpointing has been found to be useful in debugging, for example the detection of boundary condition errors. The technique also has been helpful in rollback recovery, process migration, job swapping, and virtual time simulations.
Implementation of checkpointing has been attempted at various levels. A basic example is the process scheduler found on most multi-tasking operating systems. When a process is required to relinquish the central processing ently continue at a later time. Some operating systems, e.g., IRIX from SGI, CRAY, and CONVEX OS, provide kernel support to checkpoint and restart a program or a family of programs.
User level checkpointing has also been attempted at the application level. Programmers can incorporate necessary state information into their programs, for example intermediate computations, which can be recovered should execution be interrupted. While this technique is flexible, and amenable to performance optimization, it is a severe burden on the programmer.
Improved user level checkpointing is provided by the use of a specialized checkpointing library which is accessed by application programs, which is far less onerous to the programmer. Such a library is provided by the Condor system and in the UNIX(trademark) environment using the library Libckpt. The checkpoint-restart mechanism of Condor has also been implemented for many varieties of UNIX.
Modern operating systems allow the state of the file system to be checkpointed. However, user level checkpointing techniques are usually unable to recover the state of the operating system. For example, on restart at the user level, the process identifier of each process will generally be different from its pre-checkpoint counterpart. This effectively prevents the use of run-time support methods that rely on the process identifier of the calling process. This is because the kernel controls the distribution of process identifiers, and a user-level checkpoint mechanism has no way to influence the choice of the process identifier of the restarted process. Thus a program that is checkpointed and restarted with a user-level mechanism may not assume that its process identifier is unique throughout the entire execution of the program. Another major drawback is that parent-child relationships of checkpointed processes cannot easily be restored. Such parent-child process relationships often involve strong dependency on the process identifiers of the processes involved. Indeed, the Condor system does not support checkpoint and restart of a family of processes; only a single process can be checkpointed at a time, and any relations to other processes that may be checkpointed at the same time cannot be restored.
It would be desirable to have a user-level checkpoint and restart mechanism that provides for the restoration of families of processes, where parent-child relationships and shared file descriptors are restored. This would allow for user-level checkpointing of a much larger class of processes than is possible with the current state of the art.
It is a primary advantage of some aspects of the present invention that a family of programs can be checkpointed and restored.
A technique is disclosed herein for simultaneously checkpointing all of the processes in a specified process group or family at the user or application level, and restoring those processes at a later time, optionally on a different machine, with the pre-existing parent-child relationships remaining intact. This technique also provides for file descriptors that are shared among the processes at checkpoint time to be restored to the family of processes such that the file descriptors are shared just as they were at the time of taking the checkpoint.
The invention provides a method of restoring interrelated computer processes at an application level. The method comprises checkpointing a plurality of members of a process group at a checkpoint time, and restoring the members to define a restored process group, wherein interrelationships of the members which existed immediately prior to the checkpoint time are unchanged among corresponding members of the restored process group.
According to an aspect of the invention, at least two of the members share file descriptors immediately prior to the checkpoint time. Each of the file descriptors has a pre-checkpoint value, and the method further includes maintaining a record of an order in which pre-checkpoint files were opened by each of the members, and restoring includes identifying a first file descriptor of an associated inherited file from a parent process of the restored process group thereof, responsive to the record of the order, executing a file duplication system call corresponding to the first file descriptor to yield a second file descriptor, wherein a value of the second file descriptor equals the pre-checkpoint value of the first file descriptor, and closing the associated inherited file that corresponds to the first file descriptor.
According to another aspect of the invention, one of the pre-checkpoint files is a pipe, and checkpointing also includes duplicating a pipe descriptor of the pipe to define a duplicate pipe descriptor and storing the duplicate pipe descriptor, thereafter reading data from the pipe, storing the data, rewriting the data into the pipe, retrieving the stored data, and writing the stored data into a restored pipe of the restored process group corresponding to the duplicate pipe descriptor.
According to a further aspect of the invention, checkpointing includes identifying a first process identifier of a first child process. Restoring includes creating a second child process that corresponds to the first child process, identifying a second process identifier of the second child process, intercepting a first system call that uses the first process identifier, and substituting the second process identifier for the first process identifier in the first system call.
According to another aspect of the invention, identifying the second process identifier is performed by executing a second system call to create a third child process, and identifying a return value of the second system call. The second system call can be a fork( ) call.
Still another aspect of the invention includes invoking a third system call, wherein the first process identifier is a parameter of the third system call. The third system call may be a kill( ) call.
An additional aspect of the invention includes delaying until the third child process has exited. Delaying is accomplished by invoking a fourth system call to direct a signal to the third child process, wherein the first process identifier is a parameter or a return value of the fourth system call. The fourth system call may be a wait( ) call.
According to a further aspect of the invention, checkpointing includes simultaneously transmitting a checkpoint signal to the members of the process group, and, responsive to the checkpoint signal, concurrently initiating checkpointing in all the members of the process group. The checkpoint signal may be generated by a killpg( ) system call.
Still another aspect of the invention includes of memorizing in each of the members of the process group a number of child processes created therefrom.
An additional aspect of the invention includes maintaining a first record in each of the members of the process group of a first process identifier of each of the child processes created therefrom, and maintaining a second record in each of the corresponding members of the restored process group comprising a second process identifier of each second child process corresponding to each of the child processes.
Another aspect of the invention includes checkpointing an exiting process at an exit time that is prior to the checkpoint time, and storing information of the exiting process, and restoring also includes recreating the exiting process in accordance with the stored information, and thereafter
restoring children of the exiting process.
The invention provides a computer software product, comprising a computer-readable medium in which computer program instructions are stored, which instructions, when read by a computer, cause the computer to checkpoint, at an application level, a plurality of members of a process group at a checkpoint time, and to restore the members, wherein interrelationships of the members which existed immediately prior to the checkpoint time are unchanged among corresponding members of the restored process group.
According to an aspect of the invention, at least two of the members share file descriptors immediately prior to the checkpoint time. Each of the file descriptors has a pre-checkpoint value, and the computer further maintains a record of the order in which pre-checkpoint files were opened by each of the members. Restoring includes identifying a first file descriptor of an associated inherited file from a parent process of the restored process group thereof. Responsive to the record of the order, a file duplication system call is executed, using the first file descriptor to yield a second file descriptor, wherein a value of the second file descriptor equals the pre-checkpoint value of the first file descriptor. The associated inherited file that corresponds to the first file descriptor is closed.
According to yet another aspect of the invention, one of the pre-checkpoint files is a pipe, and checkpointing also includes duplicating a pipe descriptor of the pipe to define a duplicate pipe descriptor, storing the duplicate pipe descriptor, thereafter reading data from the pipe, storing the data, rewriting the data into the pipe, retrieving the stored data, and writing the stored data into a restored pipe of the restored process group corresponding to the duplicate pipe descriptor.
The invention provides a computer system, comprising a memory for storage of program instructions, and an execution unit that accesses the program instructions in the memory for execution thereof, wherein the program instructions cause the computer to perform at an application level checkpointing a plurality of members of a process group at a checkpoint time, and restoring the members to define a restored process group, wherein interrelationships of the members which existed immediately prior to the checkpoint time are unchanged among corresponding members of the restored process group.