1. Field of the Invention
The present invention generally relates to computer systems, and more particularly, to a checkpoint and restart facility of a computer system.
2. Description of the Related Art
A computer system is composed of a CPU, a memory, an external storage device and terminal equipment, etc.
The CPU executes a program loaded into the memory.
An operating system (OS) is stored in the external storage device and is loaded into the memory when the computer system is started so as to control the computer system subsequently.
Like the OS, a user program is stored in the external storage device and is executed after being loaded into the memory by the OS according to an instruction from terminal equipment or the like.
Loading the entirety of a voluminous program or data into the memory is impossible since the size of the memory is limited. Therefore, a so-called virtual storage scheme is employed such that a process is carried out by exchanging portions of the program or the data between the memory and the external storage device.
The program is usually executed in units of execution which are called a process. One virtual space, composed of a virtual memory space and a virtual register etc., is allocated to one process. A job is composed of at least one process and job information to control the process.
Since a computer system is constructed such that it is used by a plurality of users and a plurality of programs are executed at the same time, the OS offers various functions.
For instance, the computer system offers a checkpoint and restart facility in preparation for a system down occurrence while a program is being executed.
A checkpoint facility provides for preservation of the runtime environment for each unit of execution such as an active job or an active process. A restart facility provides for restoration of the runtime environment preserved by the checkpoint facility and restarting of the execution of the program.
The OS provides system calls including a file input and output process for processes commonly executed by various user programs. The system call is like a subroutine for the process. When a system call is requested, the OS executes an associated process in a process space of the program requesting the system call. The user is charged an amount of money payable for the used CPU time in which the OS and the user program are run.
The following problems have been recognized in the checkpoint and restart facility of the related art.
With some checkpoint implementations, the runtime environment of the process provided with a checkpoint is preserved in some preceding system calls requested by the user program. That is, the checkpoint process is not distinguished from processes executed by the user program. As a result, the user is charged for the checkpoint process.
The checkpoint process may be executed at the system operator's discretion so that it is not desirable to uniformly charge the user for the process provided with a checkpoint. Herein lies a first problem in the checkpoint and restart facility according to the related art.
A second problem with the checkpoint and restart facility according to the related art is that account information is not output to an account file until a job is terminated.
All user programs may not end normally. Some store an intermediate output in a file during a process so as to continue the execution of the process later, using the intermediate output. In such cases, it is desirable for the operation of the system to charge the user for the process even if the program does not end normally.
When the checkpoint and restart process is executed, the user is charged for a job all over again when it is restarted. For example, the user is charged multiple times from a start time of a job to a checkpoint time when a job is provided with a checkpoint in a midpoint for the purpose of debugging etc. and restarted again from the checkpoint a plurality of times with different conditions.
A third problem is that only an active file (opened or accessed file) is preserved in the checkpoint process. More specifically, the restart process cannot restore the runtime environment only by the information preserved at the checkpoint, if a file other than the file preserved at the checkpoint is updated subsequent to the checkpoint process.
The third problem will be explained in detail below.
It is assumed that file A starts being used at time T0 and a checkpoint process is started at time T1. It is then assumed that the content of file B is referred to in process X at time T2 and file B is updated in process Y in time T3. When the runtime environment saved at time T1 is restored so that the process is restarted, the content of file B upon a restart of process X is different from what it was at time T2 because of process Y at time T3 and cannot be restored.