Data movement is a critical feature for disaster recovery appliances. There are numerous configurations where data are transmitted across the network for disaster recovery purposes: pairs of office protecting each other, satellite offices transmitting to headquarters, and satellite offices transmitting to relay stations that consolidate and then transmit to one or more national data centers. Communication may occur over low bandwidth links because customers are located in inhospitable locations such as offshore or in forests. The goal for disaster recovery purposes is to improve data compression during replication so more data can be protected within a data movement window.
The challenge is to transfer all of the logical data (e.g., all files within the retention period) while reducing the transmission as much as possible. Storage appliances achieve high compression by transferring metadata that can reconstruct all of the files based on strong fingerprints of segments followed by the unique data segments. One way to reduce network traffic is to identify delta changes between a previous backup and a new backup at the time of the backup, and only transmit the difference between the previous backup and the new backup to a target storage system. However, the delta changes are typically not maintained for subsequent backups. Thus, when a new backup is to be transferred, the backup logic has to perform additional scanning to determine what needs to be moved, which will unnecessarily impact the performance.