As data storage systems become ever bigger, providing efficient backup storage becomes increasingly important. Even if one is not concerned with the cost of the needed storage space, the time required to perform all the necessary copy operations becomes increasingly burdensome. For a large system, a full backup procedure can be time-consuming, requiring several hours or even days to complete. For this reason, backup procedures often provide “incremental” backups where only blocks or files which have changed since the last backup are copied. Typically, a full backup procedure is performed at infrequent intervals (for example, at an initial time followed by long intervals such as once per month). Thereafter, incremental backups are created more frequently, for example, once per day. Examples of commercial incremental backup products include TRUE IMAGE™ from ACRONIS®, Inc. and NORTON GHOST™ from Symantec Corporation.
Backups can be used for a variety of purposes. They can be used to recover from user error when, for example, the user inadvertently deletes or overwrites a file. They can be used to recover from data loss due to hardware failure such as a hard disk failure. They can also be used to recover from software failures such as application or operating system crashes. The goal of recovery after a crash is to restore the last available known good operating state for the complete system. This can be done by rebooting the same hardware after restoring the file system from a suitable backup, but the recovery procedure can be very time-consuming if the entire file system must be restored. For this reason, virtual machines (VMs) are sometimes used for backup purposes. When a VM is used for backup purposes, it is typically not used as a running machine unless and until it is needed for restoring a failed machine. Typically, the VM is launched, booted, and tested only to verify functionality and then it is shut down; however, it can be brought back on-line quickly if and when needed to replace the failed source machine for which it is functioning as a backup.
Using a VM as a backup is useful in that, if the source machine goes down, the VM can be quickly powered on in its place. With traditional backup methods, a full system restore can take hours, while the VM can be up and running in a few minutes. But whether using traditional file system backups or VMs as backups, changes made since the last backup procedure are lost. Examples of commercial products that enable VMs to be used for backup include POWERCONVERT™ from PLATESPIN®, Ltd. and VEEAM BACKUP™ from Veeam Software.
To perform an incremental backup on a protected system, a backup application needs to track which blocks of a storage device of the protected system are changed between backup cycles and transmit the changed blocks to the virtual machine serving as the backup at the start of the next backup cycle. In most cases, a filter driver can be installed on the protected system, which will keep track of modified blocks. However, in some cases, it may not be possible to use such a driver. In such cases, the backup application can perform a hash-based replication cycle, which involves reading all used blocks in the system, calculating their hashes, comparing them to the hashes of the blocks already stored in the backup and then backing up the changed blocks. Reading all used blocks is an expensive operation in terms of CPU and I/O resources involved. Moreover, due to the length of the operation, very short backup cycles cannot be achieved with such an approach.