1. Technical Field
The present invention relates to an improved data processing system and, in particular, to a method and apparatus for restarting a program or programs. Still more particularly, the present invention provides a method and apparatus for restarting programs and allowing them to continue to use old process identifications and thread identifications.
2. Description of Related Art
A computer program, also referred to as software, is a set of instructions that directs the functioning of various computer hardware resources in order to accomplish a particular task. In order to run a computer program, that program is typically loaded into the computer's main memory, where each instruction within the program is stored at a unique location, specified by an address.
A checkpoint is a snapshot of the image of a process, which is saved on non-volatile storage and which survives process failure. The checkpoint/restart facilities save information, such as checkpoint data, necessary to restart the execution of a program from that point in the program execution at which the information is saved. Upon recovery, the checkpoint can be reloaded into volatile memory, and the process can resume execution from the checkpointed state. Many applications, especially scientific applications, are computer intensive and often take days or weeks to complete successfully. These applications often do not contain any means of saving temporary results and a failure of any sort, such as power, disks, communication, and system crash, will result in loss of all work done until the failure. Checkpoint/restart capability is a service by which the application status can be saved or “checkpointed” and later—if some failure occurs—be resumed or “restarted.”
Many system calls take a process or thread ID as a parameter or return an ID on a successful system call. For example the “kill” system call in Unix takes the process ID of the process to which a signal is to be sent. And the “getppid” returns the process ID of the parent process or the process which created the calling process.
A problem encountered in restarting processes whose states have been saved when they were checkpointed is that the system cannot guarantee assigning them the same process identifications (IDs) and thread Ids. This is because some processes or threads may have already been assigned any one of those process IDs or thread IDs at the time the checkpointed processes are being restarted.
In addition, if any of the process group leaders of any process being checkpointed was not itself checkpointed, the original process group ID of that restarted process may now be taken up by another process.
However, programs often save their process IDs and thread IDs in program variables for reuse to avoid making a system call each time they need their process ID or thread ID. Further, they often communicate their IDs to other processes in the application to facilitate inter-process communication. Therefore, a need exists in the art to allow restarted processes to continue to use their old process Ids and thread Ids even though the Ids assigned by the system now differ.