Data storage is a critical component for computing. In a computing device, there is a storage area in the system to store data for access by the operating system and applications. In a distributed environment, additional data storage may be a separate device that the computing device has access to for regular operations. In an enterprise environment, the stored data in the storage area of the computing device or additional data storage often accesses one or more offsite storage devices as a part of a disaster recover (DR) strategy to protect the entire organization by having one or more copies of data at offsite locations.
In at least one presently available storage system, files are backed up by (i) creating a full backup of the files on storage media (e.g., disks, SSDs, etc.) of the storage system and (ii) creating one or more periodic full backups thereafter. Each file is stored in the storage system after the corresponding file has been processed into multiple data structures that represent the data and metadata of the corresponding file. These multiple data structures are generally used for accessing, reading, or updating the corresponding file. With regard to single file, a data structure representing a small amount of metadata that is common to a large amount of data of the file can be used to reference or point to the multiple data structures representing the large amount of data of the file. This pointing technique of using a small amount of metadata to point to a large amount of data has the advantage of minimizing the overhead associated with storing metadata in the storage system. This pointing technique, however, is not ideal for all types of backups—for example, incremental backups, etc.
An incremental backup and its variations are generally characterized as backups that store only the data and metadata of the files of a full backup that have changed, without storing the data and metadata of the files of the full backup that have not changed. Usually, the changes to the data and metadata of the files of the full backup that are captured by an incremental backup typically represent only a small proportion of the data of the full backup. This makes incremental backups much smaller and quicker than full backups. Nevertheless, as each incremental backup is stored on the storage system, a full copy of the files that are backed up on the storage system needs to be represented on the storage system in the event that a full restoration of files is needed. When the pointing technique described above is used for accessing, reading, or updating changes to the data of the files of the full backup, the overhead of updating and storing a small amount of metadata that is common to a large amount of data can be extremely expensive. This is because each time a small subset of the large amount of data is updated, then the entirety of the small amount of metadata that is common to the large amount of data must also be updated. In some situations, this can cause the small amount of metadata to be in a perpetual state of being updated. Consequently, updating this small amount of metadata can be as expensive as the updating the large amount of data that is referenced by the small amount of metadata. Furthermore, the continual process of updating the small amount of metadata can create a high churn, which can consequently reduce the life expectancy of the storage devices storing or caching the small amount of metadata.