In network environments where high-availability is a necessity, system administrators are constantly faced with the challenges of preserving data integrity and ensuring availability of critical system components. One critical system component in any computer processing system is its file system. File systems include software programs and data structures which define the use of underlying data storage devices. File systems are responsible for organizing disk sectors into files and directories and keeping track of which sectors belong to which file and which are not being used.
The accuracy and consistency of a file system is necessary to relate applications and data. However, there always exists the potential for data corruption in any computer system and therefore measures are taken to periodically save or back up file server state, to enable system recovery in the event of faults or failures.
Administrators are also looking for ways to recover from user based errors, where users and/or applications logically corrupt or delete files contained in filesystems. Traditionally tape based filesystems backups have met this need, however as the cost of disk continues to fall, more and more of these backup needs are being fulfilled by disk based, online representations of the filesystem that are statically preserved in time. These views can then be examined (for instance the 9 am, 10 am, 11 am views) at the time of loss (say 12 am) and the individual files can be retrieved from the static views and moved to the primary filesystem.
One method for backing up a file system is to collect verified snapshots (‘snaps’) of a consistent file system, and to store the snaps as file system checkpoints. When data corruption is detected or an object in the filesystem is logically corrupted or lost, one of the checkpoints can be used for file system recovery. Typically the selection of a checkpoint for recovery involves sifting through a list of available checkpoints and selecting one as a basis for recovery.
As the desire for seamless recovery grows, the number of checkpoints that are saved for file systems is increasing. As the number of checkpoints increases the time needed to sort through checkpoint lists becomes a factor in recovery delay.