1. Field of the Invention
The present invention relates to a file name detecting method for use with an operating system for managing a hierarchically structured file system and a checkpoint and restart method using the file detecting method.
2. Description of the Related Art
FIG. 1 shows an example of a hierarchically structured file system to which the present invention is directed. In this figure, squares indicate files, and circles, intermediate nodes leading to files, directories. Each of the files and the directories is specified and managed with its device and i node numbers. In FIG. 1, the device number is a number 1 or 2 attached to a device (disk), and the i node number is a number attached to a directory or file in each device.
In FIG. 1, file names are stored in directories. For example, the name of a file, x, is stored in the directory e, and the name of a file, v, is stored in both the directories c and g. The name of each directory is not stored in it but stored in an immediately preceding directory. For example, the name of the directory e is stored in the directory d.
There are bidirectional pointers between directories. For files, however, there are pointers only from the side of immediately preceding directories. That is, there are no backward pointers. Thus, for example, the directory e can point to the file x, but the reverse is impossible.
A range of nodes from the root node to each file is referred to as a complete path name. For example, the complete path name of the file x is represented by/a/b/d/e/x. The file v has two complete path names:/a/b/g/v and/a/c/v.
The physical structure of the file system will be described. As described above, both the files and the directories are managed with the device numbers and the i node numbers. Each disk is specified by its own device number and a physical location within each disk is specified by an i node number.
The entire disk is divided into a plurality of blocks with the size of each block fixed. In general, the first block is a spare block, and the second block is a super block. The super block stores management data for the entire file system including, for example, the size of one block, i.e., the number of bytes, and the number of i node blocks.
The other blocks store i node blocks and data blocks. Usually, the first n blocks are used as i node blocks and the remaining blocks are used as data blocks. One i node block stores a plurality of i nodes. From an i node number the physical position of the corresponding i node, i.e., the location of the corresponding block and the relative position of that node within that block, is obtained.
In reality, the files and directories each comprise one i node and one or more data blocks. In both the files and the directories, each i node stores the owner's name, access permission conditions, the date of update, the size of that node (the number of bytes), and others, which serve as management information, and one or more block numbers of data blocks in which data for that i node are stored. The data block number determines the physical position of the corresponding block. The data block number and the i node block number are independent of each other.
In the data blocks of each file, the contents of that file themselves are stored. In contrast, each directory stores in its data blocks its i node number, the i node number of its parent directory, i.e., the upper directory of FIG. 1, and the names and the numbers of i nodes of its child or lower directories.
In an operating system for processing such a hierarchically structured file system (for example, UNIX), an application program (task) specifies the name of a file prior to file processing to request the operating system to open that file. For example, the application program specifies the complete path name /a/b/d/e/x to request the operating system to open the file x.
FIG. 2 is a flowchart for the process of opening a file by the operating system. FIG. 3 shows management tables created when the process is carried out.
When requested by the application program 8, (FIG. 3) to open a file, the operating system searches the file system 3 for the program-specified file, then makes that file ready for access to and returns a file descriptor for the specified file to the application program 8.
The process will be described with reference to the flowchart of FIG. 2. In step S10, a search is made of the file system for a file with the specified name. In practice, the device number and the i node number of the file are obtained. In subsequent step S11, a file management table 4 (see FIG. 3) for that file thus obtained is created to store the device number and i node number of the specified file.
Within the operating system, a pointer to the file management table 4 created in step S11 is stored in a task-to-file correspondence management table 5 of FIG. 3. This table also stores data regarding to what extent the file has been read or written. In subsequent step S13, an entry is added to a file descriptor management table 7 in a task management table 6 of FIG. 3, and a pointer to the task-to-file correspondence management table 5 created in step S12 is stored in the table 7. In step S14, the added entry, e.g., an entry number, is presented to the open-requesting application program (task A) 8 as a file descriptor, thereby terminating the file opening process.
The application program 8 subsequently specifies the file descriptor for the file opened by the opening process of FIG. 2 and requests that the operating system process the file. That is, once the file has been opened, the file name is no longer used in accessing that file. Instead, the operating system follows the control tables in accordance with the specified file descriptor, accesses the file via the file management table 4, and carries out specified processing on that file. At this point, the operating system carries out the processing without storing the file name in its main storage. In other words, the operating system has no means of knowing the corresponding file name from any file descriptor.
Hereinafter, program checkpoint and restart processing, which is the other subject of the present invention, will be described. The checkpoint and restart processing is a method which, by saving the intermediate status in the middle of execution of a program for providing against the occurrence of some abnormality, enables the program to restart from the point at which a checkpoint was taken in the event of abnormality, thereby avoiding reexecuting the entire program from the beginning.
The checkpoint and restart processing is used not only for providing against abnormality but also for executing a program for a long time, for example. That is, the checkpoint and restart facility is also used in the event that, when the program processing will not terminate in a day, the program execution is terminated at the end of the day and the restart is made the next day from the point at which the checkpoint was taken the previous day.
The checkpoint and restart processing includes a process of taking a checkpoint and a process of restarting. The checkpoint taking process is a process of saving the status at a point when a program is being executed. Data to be saved include the contents of the program and its management information (for example, the position of the point of interruption) and the contents of a file and its management information (for example, the access location and access status of the file).
The restart process is a process which, in the event of abnormality, restores the program and the environment in which it was executed, on the basis of information saved at a point when a checkpoint was taken and restarts processing from the state at the point at which the checkpoint was taken.
In prior art, how a program treats a file that is open at a point at which a checkpoint is taken depends uniquely on an attribute of that file. That is, one of the following three file treatments is uniquely applied: (1) on restart the writing of the file begins from a position at a time a checkpoint was taken; (2) all the contents of the file are saved at the time of checkpoint and restored on restart; and (3) the system restores no file(s).
For this reason, during job execution, how to treat each file at a checkpoint cannot be changed regardless of data attribute. In particular, when a lengthy job is carried out, individual files cannot be treated separately during job execution because the system environment may vary greatly between the time that the job was started and the time that a checkpoint was taken. Further, even files of the same attribute cannot be treated differently even if it is desired that they be treated differently by a checkpoint-to-be-taken program.
In the checkpoint taking process in the operating system using such a hierarchically structured file system as described in connection with FIG. 1, since the checkpoint-oriented to-be-taken program in which checkpoints are taken cannot know the name of a file being processed, a method is used which saves the device and i node numbers of the file being processed and other management information needed for file restoration and restores that file on restart.
In the hierarchically structured file system described in connection with FIG. 1, once a file has been opened, access to that file is obtained by a file descriptor, and the name of that file cannot be used at all.
After a file has been opened, access to that file (for the purpose of reading or writing) is permitted simply by specifying a file descriptor. In some cases, however, the name of the file would also be needed. For example, when an error occurs while a given file is being processed, it will sometimes be desired to output a message containing the name of that file in such a form that an error has occurred when specific a file is being processed.
With a hierarchical file system, it is impossible to backward trace an open file to a directory which directly points to that file. The operating system also has no memory of the name of the open file. Therefore, when the name of a file is needed after it has been opened, conventional application programs use either of the following methods.
(1) When opening a file, a program operating within a task stores a correspondence between a file name (for example, a complete path name) and a file descriptor in the form of a table and, when the file name is needed, looks up the file name in the table according to the file descriptor as key.
(2) In the case of an existing program which does not create such a correspondence table as described above, it retrieves a file that a task may open by a file descriptor as a key to obtain its name.
The above methods both have problems. With the method (1), a program within a task must manage a correspondence between file descriptors and file names, which makes processing complex. If a program for opening a file and a program for processing an open file differ, they will be required to decide an interface as to how to manage a correspondence between file descriptors and file names. In particular, if an existing program adapted to accept only file descriptors for processing necessitates file names later (this is due to, for example, addition of a facility), it will be needed to change not only it but also another program (a file opening program) so as to manage the file-name-to-file-descriptor correspondence. If changes cannot be made to the other program (for example, where it is a third party's program and its source program is not available), the method cannot be applied.
With the method (2), all files that a task may process need to be retrieved (including files opened by that task and files opened by a parent task). Unless task activation conditions are known, the whole file system has to be searched, which increases retrieval time.
If a table in which a correspondence between keys that can be known from file descriptors and file names has been recorded for all files were set up beforehand, then fast retrieval would be permitted. In principle, however, it is impossible for a program within a task to manage the table so that it will be correct all the time. That is, a program within a task cannot know the name of a file opened by another task. With an operating system, it would be possible to direct such management. However, this would involve a waste of time of due to recording lengthy names implemented by a hierarchical file system and disk storage areas required for such names.
If there were a general, fast facility to obtain a file name from a file descriptor, it could easily be applied to a process of displaying the status of an operating task or a process of knowing the name of an open file in the checkpoint and restart processing.
The problems with the checkpoint and restart processing will be described next.
As described above, the treatment of a file at the time of checkpoint and restart is uniquely determined by the data attribute of the file and cannot be changed throughout job execution.
In particular, when a job is carried out for a long time and the capacity of a file needed at the start of job execution cannot be anticipated, the file capacity might exceed the available storage capacity. A facility has been desired which permits individual files to be treated differently during the execution of a job. Depending on the processing of a checkpoint program, even files of the same attribute might need to be treated differently. In such a case as well, however, they would be treated identically.
With the recent development of supercomputers, long-time jobs (e.g., several days) are often carried out. In this field, the file capacity of a checkpoint-to-be-taken file can increase contrary to anticipation prior to job execution. In such a case, the prior art method has to only increase the storage capacity at any rate because the file treatment in the checkpoint and restart processing is fixed. If no increase could be attained, it would result in failure in processing due to lack of capacity at the time a checkpoint is taken.
In the middle of execution of a job, the capacity of the corresponding file is checked. Even if the file has been specified at first to be saved, this specification is canceled when there arises the possibility that the file capacity may exceed the packaged storage capacity. If, in this case, the file can be saved on magnetic tape by some other means, the checkpoint and restart processing will have a wide range of applicability.
With respect to the prior art checkpoint/restart technique, the three following problems will be further described. The first problem is that it is difficult to change the checkpoint system and the restart system. In practice, it is almost impossible.
When it is desired to run a program on a system to take checkpoints and to perform restarting by another system, the i node numbers would generally vary between the checkpoint taking system and the restart system. In addition, the device numbers might also differ.
Thus, the prior art method which saves device numbers, i node numbers, and others as file management information cannot change the checkpoint taking system and the restart system. To change the systems, some other method is needed to change the i node numbers (and the device numbers) to conform to the changed systems.
For example, it may be considered to take checkpoints by a system and to begin a restart at a checkpoint by another high-performance system because more time is required than is expected. This cannot be performed easily.
The second problem is that, depending on the timing of checkpoints, a temporary file, i.e., a temporary file with no name, may not be saved and restored well. That is, a method which simply gives commands from the outside of a program may fail to perform the checkpoint taking process properly.
Here, the temporary file refers to a file that is used temporarily at the time of program execution and deleted from a file system after program execution. If a file that has become unnecessary is not removed in a timely manner, it will remain as "garbage" in the file system, reducing available disk space.
The temporary file with no name is made by calling an UNLINK function after it has been opened and declares itself to the operating system as a temporary file. An unlinked file cannot be accessed by a file name after that time (access by a file descriptor is permitted) and is automatically removed from the file system at the time of closing the file.
The prior art method examines whether a file is unlinked or not at the time a checkpoint is taken and, if it is unlinked, saves its contents together with file management information. On restart the file is opened by using an arbitrary name (since it is unlinked, any name is permitted). The file is restored using the saved information (the contents of the file and the management information) and is unlinked at the same time. Thus, a temporary file (nameless temporary file) can be restored. Note that, in the prior art as well, the contents of such a temporary file are restored by specifying a provisional name (the device number and the i node number are not used).
However, depending on the timing of checkpoints, the restart cannot be made well. For example, FIG. 4 shows a case where a checkpoint is taken at the point of (2) processing 1 Process 1 between (1) file open processing open and (3) unlink processing unlink, and a restart is made at the point of (5) processing Process 3 3 after the file processing terminates and the file is removed at the point of (4) close. In this case, however, such a restart cannot be made well.
That is, since the file is not unlinked when the checkpoint was taken, the checkpoint taking program cannot recognize it to be a temporary file and thus processes it as a general file (the prior art method saves information including the file access position). The file is later unlinked (declared as a temporary file), then removed from the file system during the file closing process. Subsequently the restart processing is carried out. Even if an attempt is made to restore the file access state to what it was at the point at which the checkpoint was taken using various kinds of management information, the file cannot be restored properly because the file itself is no longer present.
In this case, although the original file (temporary file to be processed) is not present, its device number and i node number may have been allocated to another file. If an entirely separate program is run before a restart is made to thereby create a file in the file system, an i node number will be allocated to that file. There is the possibility that this i node number may coincide with the i node number of the file removed previously.
That is, although the original file (file allocated by the (1) open processing) is no longer present, a separate file that coincides with that file in i node number and device number may be present.
As long as the restart program restores a file on the basis of a device number and an i node number, the i node number can be checked to decide whether the file has been removed or not. Some action can thus be taken. However, if the device number and the i node number have been allocated to a another file, the restart program has no key to knowing of this fact and thus cannot help assuming that processing was performed properly. Then, the restart program will resume a program to be run. As a result, an entirely different file will be read, failing to restore a file that is a candidate for restoration.
The checkpoint and restart at such timing will result in another problem that even a nameless file cannot be processed properly unless it is restored under the same name.
That is, since a file name (path name) is specified in the unlink processing after the open processing, the unlink at the time of reexecution will fail unless the same name as that at the time of opening is specified.
Note that a method by which a checkpoint is taken at any point,and processing is interrupted at that point, and later (the next day) a restart is made at the checkpoint, could circumvent such a problem, but it is inadequate for abnormal situations.
With the prior art method, the use of a temporary file with a name could circumvent the above-described problem for the time being, but this would result in a new problem. Here, all temporary files created under a directory on the basis of system operating conventions are considered as temporary files with names. That is, a system operator removes all temporary files with names at system startup or at a proper time so that unnecessary files will not remain on a disk.
Thus, the above-described problem could be circumvented by removing temporary files with names at the completion of the execution of the checkpoint and restart the processing program without removing them during the execution of that program. However, temporary files with names that other users use as well as temporary files with names that the checkpoint processing program uses would remain unremoved in the file system for that time, resulting in being pressed for disk capacity. If such a situation occurred, other users would be required to remove their temporary files with names.
The third problem is that, of the checkpoint and restart processing programs, particularly the restart processing program must be implemented with the kernel of the operating system, and enlarging the scale of the operating system results in an increase in the amount of memory required and a decrease in reliability.
With the prior art restart processing program based on physical management information for files, it is required to restore various control tables within the operating system which are associated with files that the operating system manages. The control tables that the operating system manages must be restored by the kernel of the operating system. Thus, the restart processing program run involves addition of facilities to the kernel of the operating system or modification of the kernel, which results in an increase in the scale of the kernel of the operating system.
Bugs of the kernel of the operating system may result in the system going down. Thus, an increase in the scale of the operating system will result in a decrease in the system reliability. In addition, the kernel of the operating system needs to reside permanently in a memory, which undesiably increases the amount of memory required.