Data stored on computers may need to be protected from various forms of destruction. While some of these dangers are physical and tangible (for example, failure of a disk drive, fire, or floods), other dangers are intangible or logical (for example, accidental deletion of files, or an attack by a computer virus). Data must be protected from the first category of dangers through physical means, such as remote replication, Redundant Arrays of Inexpensive Disks (RAID), highly-available systems, tape backups, etc. However, such dangers are relatively rare and the second category of inadvertent erasure or modification of data can be the more likely cause for data loss.
Storage systems employ various approaches to protect from intangible or logical losses. For example, solutions may employ file versioning, tape backups, or periodic backup to a remote server. Many of these solutions are periodic, meaning that they may be executed once a day or even less frequently. As such, when data needs to be recovered, there is a data loss that could be as high as the data created during the time period between two backups.
Continuous data protection (CDP) is a paradigm that is increasingly used to protect data on a continuous basis. In most CDP solutions, a backup is made of a file or a folder whenever it is modified. This is often implemented by making a copy of a file each time the file is closed. At recovery time, there may be a graphical or textual interface that allows the user to browse through the various versions of a file that are available, and choose the one that he or she would like to recover.
A trade-off involved in many CDP systems is in the tremendous overhead of space that is often required to store multiple versions of files. For a file of size one gigabyte, for example, each version is also about one Gigabyte. Hence, even 50 versions correspond to a total size of 50 GB. This can be a substantial amount of storage. Such a large quantity of space is needed because the entire file is stored in each version even if only a small portion of the file is changed.
Other systems store only the differential data between versions when a file is closed. Unfortunately, such file version differences can be difficult to browse through, and substantial computing overhead may be required to reconstruct the different versions from the stored difference files. This often makes the process of file recovery needlessly lengthy.
It is with respect to these considerations and others that the disclosure made herein is presented.