Backup refers to copying data to enable data recovery (restoration of original information after a data loss event), archiving (ensuring a particular system version is available if needed subsequently), historical reference (accessing contents of a file as of a particular point in the past), communication of files between systems, and the like. The additional copies are typically called backups and are useful for various purposes including restoring a state following a disaster (called disaster recovery) and restoring small numbers of files after accidental deletion or corruption. Data storage requirements are substantial since a backup system is intended to hold at least one copy of all data worth saving. Organizing storage space and managing the backup process are complicated and may address aspects of geographic redundancy, data security, and portability.
One decision made in a backup system is the time period for keeping backup files. Keeping data for too long can result in running out of storage space. Keeping data for too short a time can cause needed versions of a backed-up file to be unavailable.
In conventional backup systems, a backup client typically has responsibility to decide, upon doing a backup, that some information that would allow the restoration of some versions of backed-up files should be deleted, either because the files are too old or because too many versions exist. Each backup location (the logical storage location in which the backup files for a particular system are kept) is assumed to be under control of a particular client and is an image of a particular system, resulting in the disadvantage that no sharing between backups for different systems is available. A second disadvantage is that recovery of prior states of directories is difficult since the typical approach of representing a directory by a remote directory and a file by files representing prior versions leaves no way to indicate that a given file that had been in the directory no longer is present.
A conventional version control system uses a database to keep track of prior versions of files and directories and enables browsing of prior versions and a capability for a consistent view of a directory as of a particular date and time. The system assumes that prior versions are kept forever (or until manually deleted or deleted by rule-based program invocation) and relies on explicit “check-in” of new versions.
Some conventional operating systems maintain multiple versions of files by keeping version numbers in the files' metadata, incrementing the version number each time a file is opened for writing, and deleting old files when a maximum number of existing versions is exceeded. The technique only works for files modified by completely generating new content, and sharing between versions is not possible. Versioning directories and deleting old versions based on age are not supported.