Performing full backups of data on a computer is a very costly management task. Typically, it involves enumerating all files on the file system of the computer and backing up each of those files individually. Because of the random nature in which these files are spread over the file system and the significant overhead imposed by extracting metadata associated with the files, enumerating over all the files in performing a backup tends to be very slow. Despite the expense, most organizations perform a full backup on a weekly basis, both to limit the time that it takes to recover from a disaster and because of the need to store datasets created by these backups offsite in case of data center loss (e.g., fires, flooding, and earthquakes).
Incremental or differential backups may be performed between full backups to capture the changes that happen between the full backups. The datasets created by both incremental and differential backups may consume considerable resources in storing the differences between the file system at the time of the full backup and the time of the differential backup. With incremental backups, restoring the files on a computer after a disaster may consume substantially more time as the dataset created by the full backup may need to be restored and then datasets created by one or more incremental backups applied.
What is needed is a method and system that quickly and efficiently allows a file system to be fully backed up without severely impacting the performance of a computer. Ideally, such a method and system would also provide an efficient mechanism for restoring files to the computer in the case of partial or complete failure of the computer's file system.