The present invention relates to a method for restoring a plurality of pieces of data into a data processing system.
A data processing operation site prepares for disaster by storing data in the operation site and also backing up the data in a recording medium. An example of the recording medium is a low-cost magnetic tape medium (a tape medium).
The tape medium to which data is backed up is transported to a data processing restoration site prepared for disasters (a restoration site) and is stored therein. In the restoration site, a data processing system composed of the same devices as those in the operation site is built in advance. Thus, even if a disaster occurs in the operation site, the data can be restored to the same state as that in the operation site from the stored tape medium in the restoration site, so that operations can be restarted at the point where the data is backed up.
Conventional data restoration has been performed by reading all files stored in a tape medium and then writing the files to a hard disk drive (HDD), a solid state drive (SSD), or the like in a restoration site. Restoring a large number of files or large-sized data takes much time, thus hindering rapid resumption of operations.
Operation sites adopt a cluster system in which a plurality of computers are connected, so that even if one computer halts due to a failure or the like, the entire system does not halt, allowing the processing to be continued, during which the failed computer can be repaired or replaced. In this cluster system, the individual computers are called nodes, and distributed data storage or backup to storage devices (disks) that the individual nodes manage is performed using a software component, such as a general parallel file system (GPFS).
Data backup and restoration using the GPFS may be executed by a method as shown in FIG. 1. As shown in FIG. 1, the operation site has a system configuration including a file system 10 serving as a GPFS, a disk 11 serving as a storage device from/to which data is read and written at high speed, and a tape pool 12 from/to which data is read and written at low speed. The restoration site has the same system configuration as that of the operation site, including a file system 13, a disk 14, and a tape pool 15.
In a normal operation for storing data in the operation site, the file system 10 stores a copy of the data, as a file, in the disk 11 and also in a tape medium 16 in the tape pool 12. At backup, the file system 10 stores only inode information including attribute information (meta-information) of the file. The state in which the data of the file is stored in both the disk 11 and the tape medium 16 is called a pre-migrated state.
At the restoration site, the meta-information on the file is restored by restoring the inode information to the file system 13, the state of the file is changed to a state in which the data of the file is stored only in a tape medium (a migrated state), and the restoration is completed. Since the data restoration method eliminates the need for reading all the files from the tape medium 16 and writing the files to the disk 14 in the restoration site, operations can be resumed rapidly without taking much time for restoration.
However, data of all the files are present only in the tape medium 16 after the operations are resumed. Thus, to make the first access to the files, it is necessary to read the files from the tape medium 16, which takes more time to read data from the tape medium 16 than from the disk 14.
As shown in FIG. 2, a system is provided in which a file list from which files that may be used quickly after restoration can be selected in accordance with preset rules, and files included in the file list are read to the disk 14 in advance. This system is referred to as preferred recall. The files to be used quickly after restoration can be from the disk 14 at high speed rather than from the tape media 16.
An example standard of a magnetic tape storage device for large-volume high-speed reading and writing is LTO®. The latest LTO® is LTO-6 with a capacity of 2.5 TB and a transfer rate of 160 MB/s, which supports a linear tape file system (LTFS) common to companies and can handle the common file system under a plurality of OS environments, such as a USB memory and an SD card. As shown in FIG. 3, the LTFS is configured such that the area on a tape 17 is divided into two parts, an index partition 18 and a data partition 19, and has meta-information on the file (the attributes, path, physical position, and size of the file, an access control list, extended attributes, etc.) as indices for the data on the tape 17 to be recognized as a file by an OS.
The LTFS reads the meta-information written in the index file on the index partition 18 when the tape 17 is loaded by the magnetic tape drive 15, and after a CPU of the node expands the meta-information on a memory, responds to a request for file system information from the OS. FIG. 4 shows an example of the index file on the index partition 18. As shown in FIG. 4, the index file is managed in a hierarchical structure (directory), and the file of the directory is described in an extensible markup language (xml) format. In this example, the name and size of the file are described in a text format.
The LTFS is a file system that manages files stored in the individual tape media 16 and is provided with an LTFS enterprise edition (EE) extended so as to be used under an environment in which the file system is shared by a plurality of nodes, such as the GPFS. The LTFS EE stores meta-information on the files stored in the tape media 16 in a shared disk 20 shown in FIG. 5, thereby allowing the meta-information to be shared by a node 1 and a node 2. The LTFS EE creates dentries files 21 with the same directory configuration as that of user files of the node 1 and the node 2 in the shared disk 20 and adds file attributes to the dentries files 21 to manage the meta-information.