Deduplicating data systems are often able to reduce the amount of storage space needed to store files by recognizing redundant data patterns. For example, a conventional deduplicating data system may reduce the amount of storage space needed to store similar files by dividing the files into data segments and storing only unique data segments. In this example, each deduplicated file stored within the deduplicating data system may be represented by a list of references to those data segments that make up the file.
Some deduplicating data systems may store unique data segments in one of several containers. In some examples, such deduplicating data systems may maintain a reference count for how many data objects reference each container (e.g., in order to reclaim storage space by deleting the container when the container is no longer referenced). In these examples, the deduplicating data systems may set an expiry age for containers, after which no new data objects may reference the containers (e.g., to ensure that each container will eventually become unreferenced and may be deleted).
To protect against data loss, an organization may use a backup system to back up important data. In order to reduce the resources required to perform each backup, the backup system may perform a full backup of the data, followed by incremental backups capturing changes to the data since the last backup.
Restoring data for a system using incremental backups may require applying changes recorded in one or more incremental backups to data in a full backup. In order to improve performance and/or reduce resource consumption, some backup systems may periodically consolidate the most recent full backup and all subsequent incremental backups into a synthetic backup (e.g., an up-to-date full backup constructed from existing backup data).
When stored by deduplicating data systems, synthetic backups may simply reference existing data segments. Unfortunately, synthetic backups may interfere with some space saving techniques used by traditional deduplicating data systems. Because synthetic backups may reference a small number of data segments almost indefinitely, traditional deduplicating data systems may maintain large segment containers to store just a few data segments over a long period of time.
Accordingly, the instant disclosure identifies and addresses a need for additional and improved systems and methods for reclaiming storage space in deduplicating data systems.