Digital information has grown in volume and added value thereof has increased in connection with information processing by computers and the like. In order to prevent loss of such high-value information due to disasters and the like, it is recommended to regularly replicate a backup of a file system in another storage medium, such as magnetic tape or a hard disk drive, for redundant recording. A backup process requires a certain amount of time, as it involves a reading and writing process to write information recorded in the original storage medium into another storage medium.
Processing load of a backup operation is not very problematic when the amount of data is small. Regular backup of business-related information, possibly in excess of several terabytes, however, imposes heavy processing load on an information processing apparatus due to periodical performance, and also takes an increasingly long processing time as files grow in size.
Reduction in the time required for the backup process permits a Recovery Point Objective (RPO) to be set at shorter time intervals, which in turn enables data restoration with higher accuracy; improvement in efficiency of the backup process is an issue of increasing importance. A regular backup is often performed by detecting differential data between the files currently stored in a storage device and the files recorded in the last backup and taking only the detected differential data as a backup in order to perform the backup efficiently and without overlap. This scheme will be hereinafter referred to as a differential backup.
An example of known methods of detecting differential data for performing a differential backup is to combine an inode scan with comparison of two file lists, the current file list and the previous file list. An “inode” is an object that stores metadata of a file (such as size, mtime, UID, and data block address) and it exists for each file. Inode scan refers to the process of reading metadata of each file present in a file system and listing those files whose last update time (mtime/ctime) is later than the time of their last backup.
Inode scan, however, can only detect files that have a recent last update time and that actually exist. Inode scan, accordingly, has to be combined with file list comparison in order to find deleted files. Thus, when inode scan is applied to a huge file system that contains several billions of files, an enormous amount of time is required for generation of file lists itself as well as their comparison because the file lists become large in size.
One prior art backup scheme will now be described with reference to FIG. 1. In the traditional scheme shown in FIG. 1, inode scan is employed to create a file list of files whose last update time is later than the time of the last backup in order to find files that have been newly created after the last copy and/or updated differential data. In this process, as intermediate files for detecting files that have been deleted since the last backup, a last full file list 1002 and a current full file list 1003 are created and saved.
After completion of the inode scan, the current full file list is compared with the full file list that was created at the last backup, deleted files are detected, and a deleted file list 1001 is created. More specifically, for creation of the deleted file list 1001, the two file lists 1002 and 1003, sorted in the order of inode number, are compared with each other as shown in FIG. 1. This comparison needs to be performed on all of the files registered in the files 1002 and 1003.
FIG. 2 illustrates a prior art process of the differential backup method of FIG. 1 and intermediate files generated in correspondence with each other. The conventional differential backup with inode scan starts at operation S1100, and a full file list of all the files whose last update time is later than the time of the last backup is created by inode scan at operation S1101, at which point a full file list 1110 is generated.
Then at operation S1102, the previous full file list associated with the last backup is retrieved, and the two intermediate files are compared at operation S1103 to create a deleted file list 1111. At operation S1104, the deleted file list 1111 and the full file list 1110 that have been created are stored in appropriate storage areas in association with the current backup.
As outlined above, the conventional differential backup with inode scan entails processing load in file comparison for generating the deleted file list 1111 in addition to the full file list 1110 which results from inode scan. Additionally, since the conventional differential backup method requires storage of extra intermediate files, the full file list 1110 and deleted file list 1111, aside from a differential file, a storage capacity for storing the intermediate files also has to be reserved. The sizes of the intermediate files themselves are now non-negligible as the volume of data to be backed up can be on the order of several terabytes and an increasing number of files are frequently accessed over networks and involve modifications.
As another example, another backup method saves metadata of all files that were backed up in the past in a database. Such a method employs a scheme in which, when a backup process is activated, the metadata of all the files saved in the database are compared with the metadata of all the files obtained by inode scan so as to detect files that have been updated or created after the last backup and/or deleted files. That is, mmbackup requires comparison of metadata of all files obtained at the last backup with metadata of all files that are obtained by inode scan in addition to inode scan.