Data movement is a critical feature for disaster recovery appliances. There are numerous configurations where data are transmitted across the network for disaster recovery purposes: pairs of office protecting each other, satellite offices transmitting to headquarters, and satellite offices transmitting to relay stations that consolidate and then transmit to one or more national data centers. Communication may occur over low bandwidth links because customers are located in inhospitable locations such as offshore or in forests. The goal for disaster recovery purposes is to minimize resource contention during data movement so it can be accelerated without impacting other tasks.
The challenge is to transfer all of the logical data (e.g., all files within the retention period) while reducing the transmission as much as possible. Storage appliances achieve high compression by transferring metadata that can reconstruct all of the files based on strong fingerprints of segments followed by the unique data segments.
Typically various data movement operations (such as backup, replication) are scheduled manually and statically based on human empirical knowledge. With increasing virtualization with globalized deployment and 24×7 service level requirements across different service time zones, the scheduling task may not be as obvious as before, with thousands of virtual machines (VMs) with shared compute and storage resources being managed for backup at data center, it may become a daunting task to properly schedule central processing unit (CPU) and input/output (I/O) intensive operations, such as backup and replication, so underlying infrastructure resource would not be overwhelmed. The current approach does not adequately take into account overall resource consumption across a period of time with changing nature for possible adaptation in scheduling resource intensive operations.