1. Field of the Invention
This invention relates to computer systems and, more particularly, to backup and restoration of data within computer systems.
2. Description of the Related Art
There is an increasing need for organizations to protect data that resides on a variety of client devices via some type of backup mechanism. For example, numerous client devices may be coupled to a network to which one or more media servers are also coupled. The media servers may include or be further coupled to a storage pool consisting of one or more disk storage devices, tape drives, or other backup media. A backup agent on each client device may convey data files to the media server for storage according to a variety of schedules, policies, etc. For example, large backup datasets may be moved from a client device to a media server configured to store data for later retrieval, thereby protecting data from loss due to user error, system failure, outages, and disasters, etc. as well as archiving information for regulatory compliance, workflow tracking, etc. Backup media of the type described above may commonly store datasets in a format that will be referred to herein as an archival format.
Unfortunately, backup and restore to media in archival format may be slow and may require an administrator or technician to retrieve and mount storage media, etc. In order to make data more readily available and to reduce the storage capacity required, single-instance storage techniques have become popular. In a single-instance storage system, data is stored in segments, with each segment having a fingerprint that may be used to unambiguously identify it. For example, a data file may be segmented, and a fingerprint calculated for each segment. Duplicate copies of data segments are replaced by a single instance of the segment and a set of references to the segment, one for each copy. In order to retrieve a backup file, a set of identifiers (e.g., fingerprints) is sent to the single-instance storage system, where it is compared to the fingerprints of data stored in a storage pool. For each matching fingerprint, a data segment is retrieved. The resulting segments may be re-assembled to produce the desired file.
In order to facilitate retrieval and re-assembly of data objects from data segments, one or more metadata managers may store metadata describing the data stored in a single-instance storage pool in a catalog that is separate from the storage pool itself. Such a catalog may be referred to as a metabase. Metadata managers may be located on separate hosts or co-located on hosts that include a single-instance storage pool. Accordingly, one or more metabases hosted in a variety of locations may contain data describing each storage pool.
It is possible for both archival format backup techniques and single-instance storage techniques to be used in the same system. Archival format techniques have an advantage in that a snapshot of the state of a host's data may be stored and retrieved intact. This may be desirable from a legal or regulatory point of view. Using archival format techniques, it is straightforward to store multiple versions of a dataset that are created at different points in time and retrieve these datasets based on a time of interest. Unfortunately, archival format techniques may be time-consuming and cumbersome. There may be only a selected set of points-in-time for which an archival version of a dataset exists. In addition, it may be difficult to create a backup dataset at a busy time when a host's data is changing frequently such as at the end of a quarter, although these may be times for which a backup dataset is most often desired. In contrast, single-instance storage backup operations may take less time because de-duplication reduces the amount of data to be transferred and stored. The resulting smaller datasets may be stored on disk media rather than removable media, making for an easier backup process. These factors allow more frequent backups, including backups at critical reporting times such as the end of a quarter. Unfortunately, single-instance data is de-duplicated, which means the data objects or data segments from a given point-in-time that are duplicates are not copied to the single-instance storage pool, making reconstruction of a dataset from a previous point-in-time more difficult.
In addition to the above considerations, archival format backup techniques and single-instance storage techniques are generally executed through different software interfaces. These interfaces may not present a consistent set of attributes of their respective backup datasets. Also, they may present different models for dataset retrieval. For example, archival format backup datasets may be retrieved based on a particular timestamp, whereas single-instance storage backup datasets may be retrieved based on each data object's fingerprint, regardless of the time at which it was stored.
In view of the above, an effective system and method for extracting data from both single-stance storage pools and archival format storage pools through a common interface and converting the results to an archival format that accounts for these issues is desired.