Data storage is a critical component for computing. In a computing device, there is a storage area in the system to store data for access by the operating system and applications. In a distributed environment, additional data storage may be a separate device that the computing device has access to for regular operations. This kind of data storage is generally referred to as a primary storage, in contrast with a secondary storage, where computing devices also have access to but generally used for backing up. For data protection purposes, it is important to make regular copies of data from a primary storage to a secondary storage. While early backup strategies created complete (full) backups periodically, an alternate technique is to transfer only the incrementally modified data. By stitching together a newly modified data with a previous complete copy on the secondary storage, a new full backup can be reconstructed.
Typically, when a backup request is received at a primary storage system, a snapshot of the data to be backed up is captured and the snapshot is then transmitted from the primary storage system to a secondary storage system (also referred to as a target storage system, the terms secondary storage system, secondary storage, and target storage system are used interchangeably within the specification), while the primary storage system is still receiving further writes from a host. When the primary storage receives write requests from computing devices, it may write the data to the same locations that may be involved in the process of getting backed up to a secondary storage. To ensure the data integrity at such locations, the primary storage may utilize certain extra operations such as copy-on-write to copy the data at a storage location to a corresponding location in the snapshot as part of processing a write request. Such an operation incurs significant overheads.