Efficient and quick backup of data in a storage system presents a challenge as the amount of data to be backed up increases evermore. Incremental backup allows a data management program to produce a backup of an entire data collection once. Future backups are, typically, based on the incremental additions, modifications, or deletions to the files.
Traditional implementations of incremental backup identify added, modified, and deleted files in a data collection by scanning each data file in the data collection. If there are a large number of files, the scanning may take a long time. A data management program may implement incremental backup using a journal mechanism that adds an entry to a log each time a file in a database collection is created, modified, or deleted. The journal, however, may be overwhelmed by a high velocity of changes.
For example, if each file in a large data collection with millions of files is renamed, the journal will attempt to log each filename change in a short period of time (i.e., within the context of a single command). Such an operation may cause the journal to cease functioning or shut down. Using a journal may also be inefficient if a single file is changed multiple times.
Incremental backups always rely on a mechanism to provide a list of files that have changed after the creation of the previous backup. Accordingly, a data management program needs to be used to reconcile the files that may have been changed multiple times to arrive at a final file status (e.g., added, modified, or deleted) for each file. Reconciling multiple changes in a large data collection, however, may take longer than simply scanning each file like in a traditional incremental backup.
More efficient systems and methods for incremental backup are needed that can overcome the aforementioned shortcomings.