This invention relates generally to methods and systems for backing up computer data in computer systems, and more particularly to methods and systems for verifying the accuracy and integrity of backup data.
Computer systems store data that is unique and often critical, and, if lost, the data would frequently be expensive, difficult or impossible to replace. The data is normally stored on hard disks or other storage technology which is subject to the possibility of failure. Additionally, data may also be lost by theft, fire or other disaster, and frequently the data loss is permanent. Accordingly, backup methods and systems have been developed to maintain controlled redundancy of data to enable data to be recovered in the event of a disaster to avoid or minimize the loss of the data.
Backup systems copy source data from a computer source volume to backup media so that if the original source data is lost, it may be restored from the backup copy. Since data in a computer system is continuously being created or modified, it is important that the backup process be performed frequently to ensure that the backup copy of the data is reasonably current. While backup may occur continuously during the operation of a computer, this is usually not preferred since backup can either slow down or prevent normal computer systems operations. Accordingly, most backup is batch-oriented and performed at a predetermined time during a backup window, such as at night when the computer systems are not being used for normal processing operations. This is particularly the case for systems which backup data in computer networks of large enterprises that have many different computers and many different source storage volumes to backup.
Depending upon the volume of original source data to be backed up and the type of backup technology employed, backup may be a time-consuming and burdensome process. If the amount of source data stored on a source volume, such as a hard disk, that is being backed up spans multiple backup media, i.e., exceeds the capacity of a single media such as a tape or optical disk, for example, and a single backup device is being employed for backup, the backup process must be interrupted while new media is mounted on the backup drive. This also frequently requires recopying a portion of the previously copied data to the new backup media, particularly if the first backup media ran out in the middle of a data file being copied. The individual pieces of backup media, such as tapes, CD/DVD discs, disks, or cartridges are referred to as members of a backup set, and the backup set may back up one or more source volumes.
For enterprises having a network comprising many different computers and source volumes, the backup window may afford insufficient time for copying source data that needs to be backed up to backup media. Depending upon the backup media, the volume of source data, and the network speed, a considerable amount of time may be required simply to copy the source data. Moreover, in addition to copying source data to the backup media, it is desirable to verify that the source data is copied correctly so that an accurate backup set is maintained. Thus, backup processes normally include a verification process to insure that source data is copied correctly to the backup media.
Verification as presently done on known systems involves comparing the backup data in the backup set with the original source data, and is usually done right after the source data is copied to the backup media. If the backup set spans multiple media, in order to perform verification it is necessary to remount all members of the backup media to which data has been written in order to perform verification. Thus, if a source volume of original data being backed up spans two media members, not only must the backup process be interrupted in order to mount the second media when the first becomes full to complete copying the source data, during verification the process must again be interrupted by remounting the first backup member to verify the portion of the source data written on it, and then mounting the second backup media to verify the source data written on the second media. Even if the data does not span multiple media, when backing up multiple sources to a sequential access medium, such as tape, conventional backup requires backup, rewind, verification, rewind and verification. This is time consuming and causes wear and tear of the medium. Such time and overhead burdens of the backup and verification processes may prevent backup from being completed during the backup window. Accordingly, the system administrator may be able to perform only a partial backup during the backup window, backing up only some of the source volumes which need to be backed up, and may be required to schedule backup of other source volumes for different times or days. Otherwise, the administrator may be required to forego the verification process, which is undesirable, or extend the backup window into the period of normal operations, running the risk of disrupting normal operations or trying to backup files as they are being changed, which is also undesirable.
Moreover, even if the system administrator is able to postpone verification until after the backup window, this is not a practical or effective solution. It would still be necessary to compare the original source data with the backup data copied to the backup media for verification, which requires access to the source volume and possible disruption of normal operations. Also, if the original source data was changed between the time it was copied to the backup media and the time when verification is performed, the files would “miscompare”, indicating invalid backup data, and the verification would fail even if the original data was copied correctly to the backup media.
Verification is also desirable during other data transfer operations. For example, it is desirable to verify the integrity of data read from the backup media and copied to another media, as for archiving or making a duplicate copy of the backup data. These operations also require access to the original source data, and involve the same difficulties as those encountered during verification of backup.
It is desirable to provide backup methods and systems which avoid the foregoing and other problems of known backup approaches by affording backup processes that have faster, more flexible, and more complete verification, that do not unduly burden the normal operations of the source computer systems being backed up, and that ensure greater accuracy and integrity of the backup data. It is to these ends that the present invention is directed.