This invention relates generally to methods and systems for backing up computer data in computer systems, and more particularly to methods and systems for verifying the accuracy and integrity of backup data in computer networks.
Computer systems store data that is unique and often critical, and, if lost, the data would frequently be expensive, difficult or impossible to replace. The data is normally stored on hard disks or other storage technology which is subject to the possibility of failure. Additionally, data may also be lost by theft, fire or other disaster, and frequently the data loss is permanent. Accordingly, backup methods and systems have been developed to maintain controlled redundancy of data to enable data to be recovered in the event of a disaster to avoid or minimize the loss of the data.
Backup systems copy source data from a computer source volume to backup media so that if the original source data is lost, it may be restored from the backup copy. Since data in a computer system is continuously being created or modified, it is important that the backup process be performed frequently to ensure that the backup copy of the data is reasonably current. Most backup operations are batch-oriented and performed at predetermined times during a backup window, such as at night when the computer systems are not being used for normal processing operations. This is particularly the case for systems which backup data in computer networks of large enterprises that have many different computers and many different source storage volumes to backup.
In enterprises having computer networks comprising many different computers and source volumes, backup may be distributed among one or more central backup servers having multiple back up media. For example, backup servers and media may be distributed across a LAN, a MAN or even a WAN, and backup may require data transfers across such networks to the distributed backup media. As is well known, network data transfers are susceptible to errors because of the network. Such errors result in invalid data being copied to the backup media, and limit the usefulness of the backup set in the event data needs to be restored.
Because of the importance of backup data, it is necessary that an accurate backup data set be maintained. Accordingly, in addition to copying the source data to the backup media, it is normally required that backup data be verified after copying the source data to backup media. Verification ensures that the source data was copied correctly so that an accurate backup set is maintained, and verification is normally included as part of a backup process.
Known verification systems and methods involve comparing the backup data in the backup set with the original source data to determine whether the two sets of data match. Verification is usually done right after the source data is copied to the backup media. If the backup set spans multiple media, for example tapes or discs, in order to perform verification it is necessary to remount all members of the backup media comprising the backup set to perform verification. This increases substantially the time and overhead burden of the backup and verification processes, and may prevent backup from being completed during the scheduled backup window. Accordingly, a system administrator may be able to perform only a partial backup during the backup window, backing up only some of the source volumes which need to be backed up. Otherwise, the administrator may be required to forego the verification process, which is undesirable, or extend the backup window into the period of normal operations, running the risk of disrupting normal operations or trying to backup files as they are being changed, which is also undesirable. If the source file changes between the time it was backed up and the time verification is performed, a “miscompare” will occur even if the original source file was correctly copied to the backup media, causing verification to fail.
Verification is usually performed by a backup server. Accordingly, even if the original source data has not changed, errors can occur in rereading the original source data and backup data and transmitting the data to the backup server for comparison. This is particularly a problem with data transfers over a network to a central backup server. If transmission errors occur, the “reread” original source data will be invalid and when compared with the backup data on the backup media, verification will fail even if the source data was originally copied correctly to the backup media. The backup data will be indicated to be invalid, and this will necessitate recopying the source data, usually during a subsequent backup process, resulting in inefficiencies. Moreover, until recopied, this will render the backup data unreliable and of little or no value should a disaster occur and recovery be necessary. An error may also occur during the transmission of the original source data for backup, resulting in the backup data being inaccurate.
Errors can also occur during data transfer operations other than backup, and a verification process is desirable to ensure accurate transfer. For example, it is desirable to verify the integrity of data read from the backup media and copied to another media, as for archiving or making a duplicate copy of the backup data, or for a restore in the case of a failure of the source media. These operations involve the same difficulties as those encountered during verification of backup.
It is desirable to provide backup methods and systems which avoid the foregoing and other problems of known backup approaches by affording backup and verification processes that are efficient, accurate, and more reliable for verification of backup data and data transfers, especially over networks. It is to these ends that the present invention is directed.