Backing up or restoring large file systems in a limited amount of time is an increasingly critical problem for enterprises. Typically, backup/restore plays a critical role in at least two fundamentally different real-world scenarios, disaster recovery and recovery of a few select files (small-scale recovery).
Disaster recovery involves large amounts of data to be restored. Disaster recovery may be necessitated by a hardware failure of the storage subsystem. For example, one or more disks or storage partitions may be lost to an extent that RAID (disk redundancy) protection and file system checks (fsck or chkdsk) are unable to remedy the situation. Alternatively, the problem could be caused by operator error (e.g. an operator may accidentally overwrite a volume). The entire file system can be restored to a storage device that has enough capacity to accommodate the entire file system. This storage device could be smaller than the original (damaged) raw device if the file system was less than full. More likely, the new storage device would be the same size, or larger, than the one on which failure occurred. The file system may be taken off-line while the restore is in progress, or it may be partially available.
Select-file recovery typically involves a small amount of data to be restored. Typically, this recovery is used to recover data after a user realizes that the user unintentionally deleted or modified a file. Often, the user wants to retrieve an older copy of the file from backup, but keep the current copy of the file too. It is desired to place the restored file either in the original file system, or another file system. This process typically involves interactively browsing various backup sets to determine of the file is found. Select-file recovery may also be used for partial loss due to file system damage. For example, a customer could experience a power outage or a server crash. The system administrator can run a tool like fsck, which will bring the file system to a consistent state, but a few files may be lost. The system administrator may then attempt to recover only the lost/damaged files from backup while the rest of the file system is declared to be healthy and returned to active service.
With Unix, there have been several traditional backup/restore methods including (i) tar; (ii) cpio; (iii) dump/restore. Tar and cpio operate on mounted file systems, whereas dump operates on raw devices. Dump understands the file system on-disk structures, and creates a backup stream that contains information about inodes, as well as directory entries (d-entries), and data blocks in use. In Unix® and Linux®, an inode (index node) describes a file (which itself may be a directory). A file can have several names (or no name at all), but it has a unique inode. A d-entry describes a name of a file including the inode of the file in the directory plus the pathname used to find the file.
Restore operates on mounted file systems and does not preserve inode numbers. Restore has an interactive mode, which is a shell-like interface that allows traversing the dumped namespace to mark which directories/files should be restored. Restore can restore to the same file system, or some other file system, and can overwrite existing files, or can be directed to put the restored files into a different directory.
Essentially two kinds of backup/restore strategies have been used, a name-space walk through a mounted file system, and dump and restore. Name-space walk is exemplified in legacy systems by single-threaded tar and cpio commands, which is file system indifferent. Tar and cpio generate their own inverses that can be used for restores. When a backup is performed using name-space walk, a tree is traversed from root to directory to file across all possible branches. Such a name-space walk does not preserve inode numbers or d-entries; but does preserve the names of directories and files. Thus, a restore of a namespace-based backup can write to a new file system.
On the other hand, dump and restore commands perform walks across the file system based on inodes, not names. Dump and restore can work on raw disks and are file system specific. In this case, the restore process takes a pass to discover what is to be restored on the basis of inode information that may involve examination of time stamps indicating changes, modifications, access, etc. and another pass to fix and restore the data. When this is accomplished, the restore re-creates the file system components.
The magnitude of data storage is increasing, and bringing with it associated issues. While data storage is on the rise, it is desirable to keep backup times low. Today, it would take about 72 hours to backup a 10 TB file system, assuming a single tape device data streamed at 40 MBytes/sec. This backup window is likely to be unacceptably long, especially given that full level-0 backups are often performed about once every two weeks (e.g., because typically 40-50% of a file system's contents have changed after two weeks). Further, a typical tape can now store only about 80 GBytes of data (though 200 GByte cartridges will soon become available), which means that it could easily take several tapes just to back up a small portion of an enterprise's data. A full level-0 backup of a 10 TB file system would use at least 128 tapes, whereas a typical daily backup of the same file system would use about 12 tapes.