Data storage is a critical component for computing. In a computing device, there is a storage area in the system to store data for access by the operating system and applications. In a distributed environment, additional data storage may be a separate device that the computing device has access to for regular operations. This kind of data storage is generally referred to as a primary storage, in contrast with a secondary storage, where computing devices also have access to but generally used for backing up. For data protection purposes, it is important to make regular copies of data from a primary storage to a secondary storage. While early backup strategies created complete (full) backups periodically, an alternate technique is to transfer only the incrementally modified data. By stitching together a newly modified data with a previous complete copy on the secondary storage, a new full backup can be reconstructed.
At a primary storage system, the block numbers of a storage volume that are modified (happened to be written to) may be tracked by a bit map, which is referred to as a changed block map. A backup operation reads and just backs up the changed blocked of the storage volume without traversing through a file system to identify changed files. In order to ensure that the backup is also consistent, when a backup request is received at a primary storage system, a snapshot of the storage volume is captured and blocks indicated as modified are read from the snapshot and transmitted from the primary storage system to a secondary storage system (also referred to as a target storage system, the terms secondary storage system, secondary storage, and target storage system are used interchangeably within the specification), while the primary storage system is still receiving further writes from a host. When the primary storage receives write requests from computing devices, it may write the data to the same locations that may be involved in the process of getting backed up to the secondary storage. To ensure the data integrity at such locations, the primary storage may utilize certain extra operations such as copy-on-write to copy the data at a storage location to a corresponding location in the snapshot as part of processing a write request. Such an operation incurs significant overheads and costs degraded performance of backups.