1. Field of the Invention
The present invention is related to the protection of computer data and, in particular, to a system and method for making backups made to offline storage media, such as tapes, that are not directly accessible as file-structured devices.
2. Description of the Prior Art
A backup procedure is a function that is typically included in most computer operating system software. One of the most pressing backup problems over the last several years has been the time vs. volume dilemma. Storage capacity and actual online storage volumes have increased at a geometric rate, roughly doubling every two years. However, the bandwidth of storage subsystems, that is, the rate at which data can be transferred into and out of the storage subsystem, has increased at a much slower rate. Consequently, the time required to make a complete copy of online storage has steadily increased. In addition, most backup procedures use the file system to produce file-coherent backups. This imposes additional overhead that considerably reduces the effective bandwidth of the storage subsystem. Many hours are required to make a full backup of a large scale installation.
At the same time, many computer installations are faced with increasingly stringent uptime requirements. The xe2x80x98backup windowxe2x80x99 (i.e., the time during which the data is stable so that a coherent backup can be made) continues to shrink. In many cases, the available backup window is already smaller than the time required to create a full backup. Computer installations have applied a number of ad hoc measures to address these difficulties, with varying degrees of success. Consequently, many installations are running with inadequate or no backup coverage because the backup window is inadequate.
One approach is the physical backup, a brute force approach that copies the disk volume block for block, ignoring the file structure. The physical backup can operate at the maximum possible data rate of the storage subsystem. However, the physical backup does suffer from certain disadvantages. First, all activity on the disk volume must be completely frozen for the backup to be useful because there is no coordination with the file system. Second, recovery of individual files from a physical backup is cumbersome because the entire backup must be restored to disk to process the file structure. Third, even the maximum storage bandwidth may be inadequate in a very large-scale storage environment to perform a full physical backup in the available backup window.
Another well-known approach is the incremental file backup. In this approach, individual files are backed up if they have been modified since the previous backup. If they have not changed, they are not backed up. This method reduces the volume of data to be backed up to the volume of files that have changed. It works well in an environment where files are relatively small and are typically modified in their entirety. It does not work well when files are large, and typical updates modify a small part of the file, because even with a small modification the entire file must be backed up. Also, complete reconstruction of a data volume from incremental file backups can be problematical because files that are deleted during the life of the volume will reappear when successive incremental backups are restored. Depending on the design of the file system and the backup, incremental restores can introduce other inaccuracies, compared to the original volume.
However, the basic incremental backup method suffers from the disadvantages that a considerable amount of time is spent processing the file structure to locate files that need to be backed up, and the process of reconstructing a disk volume from incremental backups is complex and trouble-prone. Accordingly, the system manager would typically perform periodic full backups in addition to the incremental backups to limit the risk of recovering with incremental backups alone.
Another approach is disclosed in U.S. Pat. No. 5,835,953, xe2x80x9cBackup system that takes a snapshot of the locations in a mass storage device that has been identified for updating prior to updating,xe2x80x9d issued to Ohran. This basic incremental backup method includes maintaining a xe2x80x9cvirtual diskxe2x80x9d subsystem capable of generating snapshots and making a full copy of the snapshot for remote disk storage. An initial and a subsequent snapshots are obtained. Snapshot mapping data is used to determine the data blocks which have changed from the initial snapshot to the subsequent snapshot. The changed blocks are then copied to the remote storage, the initial snapshot is deleted, and the process is continued as needed.
The method disclosed in Ohran ""953, for example, provides a complete backup copy of the data volume to allow recovery if the original volume is lost. However, the prior art does not address situations in which individual files need to be recovered, such as when a file is erroneously deleted or when an application fails and writes incorrect data. Once a snapshot and copy cycle have been performed using a conventional method, the previous (and possible the only valid) file contents are lost. Thus, there is a need in the art for an effective backup strategy which preserves old versions of the file contents at suitable intervals to allow recovery when errors are subsequently detected.
The data volume in a computer system can be protected by first acquiring a base state snapshot and a subsequent series of data volume snapshots. A plurality of snapshot difference lists can be generated by identifying those data blocks which differ between sequential snapshots. A precedent snapshot difference list, generated by identifying the data blocks in any snapshot differing from the data blocks in a subsequent snapshot, is used to recover files without incurring a full restore. The data blocks described by the snapshot difference list are copied to backup storage and the snapshot is deleted. File recovery is accomplished by overwriting data from a current snapshot list with one or more precedent backups. A succedent snapshot difference list, generated by identifying the data blocks in any snapshot differing from the data blocks in a previous snapshot, is used to restore a data volume. The data volume is restored by restoring the base state data with data blocks contained in one or more succedent backups.