Various information protection techniques are used to improve the availability of information. For example, backup techniques are used to provide redundant copies of information. If the original copy of the information is lost, due to equipment failure or human error, the information can be restored from a backup copy made at an earlier point in time. Backup techniques include full backups, which create a point-in-time image of a set of information on a backup device, and incremental backups, which copy the portions of the set of information that are modified during a particular time period to a backup device.
Replication is another information protection technique that is used to maintain copies of information at separate locations. For example, information can be replicated on several different sites within a corporation's campus and/or on several different ones of the corporation's campuses. If the information is replicated at different sites, and if the failure of the systems storing the information at one site is unlikely to cause the failure of the corresponding systems at another site, replication can provide increased information reliability. Thus, if a disaster occurs at one site, an application that uses that information can be restarted using a replicated copy of the information at another site.
Many information protection techniques, such as backup and replication, involve making point-in-time copies of the information stored on a primary storage device and then tracking incremental changes to that information. For example, a typical backup technique involves taking a full backup of the primary storage device and then, after the full backup has completed, taking one or more incremental backups. As noted above, the full backup is a complete copy of all of the information stored on the primary storage device at a particular point in time. In contrast, the incremental backups are copies of only the portions of the information that have been modified during a particular range of time. Completion of the full backup can take a significant amount of time (e.g., several hours). If the first incremental backup represents all changes that occur since the point in time represented by the full backup, and if the information in the first incremental backup cannot be copied to a backup device until the full backup has completed, a significant amount of resources may be required to temporarily buffer the information that will ultimately be copied to a backup device when the incremental backup is taken.
Like backup techniques, replication techniques often depend on having both point-in-time copies of and records of incremental changes to information stored by the primary storage device. For example, replication typically involves initializing a storage device at a remote site by restoring that storage device from a full backup of the primary storage device. After the storage device at the remote site has been initialized as a point-in-time copy of the primary storage device, incremental changes that have occurred at the primary storage device during the initialization of the remote site are replicated to the remote site. The primary site requires resources to temporarily buffer the incremental changes while the storage device at the remote site is being initialized.
As the above examples show, in typical situations, replication or backup activity that is based on incremental changes is delayed until the activity involving a full point-in-time copy (e.g., a full backup) has completed. For example, in replication, replication activity is delayed until the backup of the primary site and the restore at the remote site have both completed. This introduces both delay and expense into a data protection solution. For example, a typical replication scenario involves taking a full backup of the primary storage device, shipping the backup to the secondary site, and restoring the replica from the backup. This process can take several days, and it must be completed before any of the incremental changes to information stored on the primary storage device are applied to the replica. As a result, there needs to be a mechanism at the primary site to track all of the changes that occur to the information on the primary storage device subsequent to the point in time captured by the full backup.
Traditionally, a primary replication site is configured to store incremental changes, which need to be replicated to the secondary site, that occur over a several hours (e.g., 24 hours) in order to provide protection against situations in which the link between the primary and secondary site goes down. The amount of storage needed to be able to store the incremental changes at the primary site can be determined based on the time span (e.g., 24 hours) and expected storage access patterns. The backup and restore process involved in initialization may require significantly more than that time span, however, due to the time needed to ship the backup copy to the secondary site. As a result, the primary site needs to be configured with enough storage to store more than the normal amount of incremental changes during the initialization process. Thus, additional storage is required on primary site during initialization. Additionally, the recorded incremental changes are not transferred to the secondary site via the network until the backup and restore process has completed. Since there will be a larger amount of data to be transferred at the end of the backup and restore process than during a normal replication period, higher network bandwidth is required during the initialization process than during normal operation.
As the above examples show, existing data protection techniques have the potential to incur undesirable delay and/or expense. Accordingly, new techniques are desired for handling full backups and incremental changes when performing data protection techniques such as backup and replication.