1. Field of the Invention
The present invention relates to a computer program product, system, and method for using a metadata image of a file system and archive instance to backup files in the file system.
2. Description of the Related Art
Typical data protection environments are based on client-server architectures. The backup server administers the common resources like disk storage and tape storage which are used to store the backup data from the client machines. The backup server uses a database to store metadata and statistical information about the backup clients and the common storage. Furthermore the backup server implements a functionality to create an instant archive. This instant archive may comprise a point-in-time copy of the active database at the backup server to allow for access of the backup as of the point-in-time the instant archive was created. Multiple backup clients connect to a single backup server and send their data for protection. The backup client decides the level of granularity used for the data protection. The traditional file level backup provides the whole file as the level of granularity for backup and restore and it uses the path to the file as the unique identifier. Block level backups provide a single block as the level of granularity and uses the block identifier as the unique identifier.
An image backup involves the backup of the complete contents of a physical storage media. In International Business Machines Corporation's (“IBM”) General Parallel File System (GPFS™), a metadata image backup describes the ability to backup the metadata structure of a file system separately from the file object data. The metadata includes all components of the file system that are required to recreate the file system, but does not include the actual user data. A metadata image backup is typically used with the migrated data in a Hierarchical Storage Management (HSM) system to restore the file system in the event of a disaster. Once the metadata is restored, the file system may be brought on-line providing user access to the files. The data may be restored in bulk using an optimized tape order, or it may be restored on demand as users access individual files. (IBM and GPFS are registered trademarks of IBM in the United States and other countries).
Snapshot is a common industry term denoting the ability to create a point-in-time copy of all the data. Typically, snapshot creation is done instantly and the data is copied only when modified, referred to as a copy-on-write, in order to preserve the data as of the point-in-time the snapshot was created. Snapshots are made available for use by other applications such as data protection, data analysis reporting, and data replication applications. The original copy of the data continues to remain available, and writable to applications without interruption, while the snapshot copy is used to perform specialized read-only functions. A software snapshot typically is provided from a file system (e.g. IBM's GPFS). The GPFS snapshot creates an instant copy of the entire directory structure. The files in the recently created directory structure are only links to the files in the active (live) file system. A write operation on a file initiates creating a copy of the original data blocks into the snapshot structure before the write operation proceeds (copy-on-write). A hardware snapshot may be implemented inside the storage system, such as with the IBM D58000®. (DS8000 is a registered trademark of IBM in the United States and other countries). The hardware snapshot creates an instant copy of a primary disc image on a secondary disc. This copy-on-write mechanism works similarly to the software mechanism but operates at the storage device block level. Note that a snapshot, by itself, does not constitute a backup, since the data is not copied to a second storage medium and thus not protected against failures. Snapshots are typically used by the backup system to create a point-in-time consistent version of the file system.
The goal of recovery is to bring a file system back into use as quickly as possible. For large file systems, restoring all of the user data from an off-line media, such as tapes, may take an extended period, such as weeks. The time may be substantially reduced using an HSM system, by restoring the file system's metadata, such as the directories and file attributes, and not restoring the actual data. In a typical file system, the metadata represents about 1% of the total data, thus reducing the time for recovery to minutes or hours. The actual data is restored when it is accessed by the user in the same manner as an on-demand recall from off-line HSM storage. The on-demand recall depends on two underlying features: First there must be a way to intercept the user access (such or read or write) and suspend the user thread; meanwhile a signal is sent to the HSM system to restore the data. Once the data is restored, the user thread is resumed. Typically, for an HSM system this mechanism is part of the X/Open Data Management API standard (“DMAPI”). Second, the HSM system assigns the data a unique external identifier which corresponds to a database entry that contains the physical location of the data. This external identifier does not depend on the name of the file, or the path to the file, or even on the file's location such as its physical disk address or logical inode number. The external identifier remains unchanged even as the file changes.
A backup and restore system may consist of the following components: a client or computer system that will be protected; a data protection client which coordinates the protection of the client system and initiates and monitors the data transfer for backup and restore; a data protection server which manages storage devices that are used to store the data of protected client systems and implements a scheduler that can initiate the data protection. A common network infrastructure is used for communication between the components, such as Ethernet, and a Storage Area Network (SAN) may be used for data transfer.
A backup of the data is started to initiate and monitor the backup of the data in a file system or a subset of a file system. The backup may be triggered by a scheduler or manually by the administrator. A backup session is established to read the data and send the data to a backup server to store.
A restore operation may be triggered by the administrator. The backup client connects to and establishes a restore session with the backup server and requests the data. The restore sessions ends if the data was successfully restored and written to the client file system.
A classic file based backup and restore involves a regular file system scan to collect the required information for the backup. The backup will be done at the object level. A restore requires that each object be named (e.g. by pathname or inode number) and its real data can then be fetched when the object is restored.
A classic block level backup and restore requires change tracking at a storage device block level to collect the required information for the backup. The backup and the restore happen on storage media block level.
A software snapshot based backup and restore requires that the file system which is used for the backup provides software snapshot functionality. A snapshot primarily creates a point-in-time copy of the data. After taking the snapshot the backup procedure reads the snapshot data to protect the file system by copying to backup media.
A hardware snapshot based backup and restore requires that the hardware which is used for the backup provides hardware snapshot functionality. A snapshot creates a point-in-time copy of the data. After taking the hardware snapshot, the backup procedure reads the hardware snapshot data to protect the file system by copying to the backup media.
An HSM based metadata image backup and restore requires HSM management of the file system. An integrated backup/archive and HSM server provides the means to utilize a single tape library for both backup data and HSM storage. Protected file data resides on either tape or both on live disk and on tape. A file system metadata image (inode data) is constructed for backup and must be sent to the backup server.
There is a need in the art for improved techniques for backup and restore of objects in a file system.