Continuous data protection (“CDP”), also called continuous backup, generally refers to the backup of data on a computer by automatically saving a copy of every change made to that data. While traditional backup solutions take a snapshot of the files or data on a computer at a specific time, CDP essentially captures every new version of the data saved on the computer in real-time. CDP may be performed at the file-level or the device-level. Device-level CDP generally allows a user or administrator to roll back the entire state of the device, such as a disk drive, to any point in time, while file-level CDP may allow a user to view and select a specific version of a particular data file to restore.
File-level CDP is typically implemented through a background service executing on a computer that monitors specified files and folders stored on local or remote storage volumes. When a monitored data file is changed, the new, modified version of the file is copied to one or more backup locations, such as internal storage, an external/removable storage device, and/or a remote storage system, such as a LAN-based storage server or a cloud-based storage service. While each new version of a data file may only differ from the previous versions by a small amount, traditional file-level CDP solutions may backup an entire copy of the modified version of the file. As a result, a small data file stored on the storage volume may occupy a disproportionately large amount of space in the backup location.
When utilizing a cloud-based storage service as a backup location, this large amount of space may make the cost of CDP prohibitive, since many cloud-based storage services charge a fee based on the amount of storage space utilized. It may be desirable for the CDP process to perform de-duplication of each new version of a monitored data file against previously stored versions in order to remove the duplicate data before backing up the file to the cloud-based storage service. However, de-duplication of the new version of the data file may require a significant amount of I/O against the previous version data stored on the cloud-based storage service. Since many of these services also charge a fee per I/O request or per amount of data transferred, the de-duplication process itself may increase the overall cost of the cloud-based storage service as a backup location.
It is with respect to these considerations and others that the disclosure made herein is presented.