Both the capacity and reliability of data storage devices (e.g., disk drives) used in modern day computing systems have been steadily increasing over time. However, despite the increased reliability of modern day storage devices, large and small enterprises alike still require data backups. Implementing a backup scheme for a standard hard drive on a single personal computer is moderately difficult, however, implementing and managing an enterprise-wide backup scheme can be a serious challenge.
One way large enterprises are meeting this challenge is by implementing backup systems based on the Network Data Management Protocol (NDMP). NDMP is an open standard protocol for controlling backup, recovery, and other transfers of data between primary and secondary storage. The NDMP architecture separates the centralized Data Management Application (DMA), data servers and tape servers participating in archival or recovery operations.
One feature of an NDMP backup system that is particularly advantageous is a mechanism for recovering files referred to as Direct Access Recovery (DAR). Recovering a selected group of files from a backup image using a standard recovery operation requires sequentially reading all of the tapes that make up the backup image until all of the files have been recovered. In some cases, this may take hours or even days. However, using DAR, files can be recovered more quickly by reading only the relevant portions of a backup image during a recovery operation.
During a typical DAR operation, an information systems administrator interacts with the graphical user interface of the DMA (e.g., a backup application) to select one or more files to be restored from a particular backup image. After the user selects the files to restore, the DMA communicates a request to a data server to restore the files from a particular backup image. Along with the request, the DMA also communicates file history information to the data server. The file history information is received and stored at the DMA, after being communicated from a tape server to the DMA at the time the backup image is originally generated. The data server extracts the exact location of each file from the file history information and communicates the information back to the DMA. Next, the DMA communicates a request to the tape server to restore the selected files to a particular file system. Because the request includes the exact location in the backup image of each file to be restored, the recovery operation occurs relatively quickly.
Despite the superior performance of DAR compared to the standard recovery operation, DAR has many limitations. Current implementations of DAR are incapable of restoring directories. For example, if the DMA sends a request to restore a directory, the data server simply ignores the request and/or reports an error. This inability to restore directories is particularly problematic when a file's attributes (e.g, owner, read/write/modify permission settings, etc.) are dependent upon the attributes of the directory in which it is stored. Some backup applications simply do not allow the user to select a directory to restore, thereby forcing the user to select, within a directory, each and every individual file that the user would like to backup and/or restore. Other backup applications work around this problem by allowing the user to select a directory to restore via the graphical user interface (GUI) of the backup application, and then expanding the directory to populate the list of files to restore using DAR with the contents of the directory. Although this approach lends the advantages of DAR for restoring files within a directory, this approach is problematic because the directories themselves are not properly restored. In particular the directory and subdirectory attributes are not restored. Without restoring the directory and its attributes, one or more of a file's attributes may not be properly restored. Consequently, this expanded list approach only works for users for whom restoring permissions of directories is not an important issue. In addition, because current implementations of DAR are incapable of handling directories, file systems that support data streams, such as Windows NT®, are not fully supported by DAR.
Furthermore, some current implementations of DAR are inefficient when restoring files that are physically contiguous on a backup tape. For example, often a user will desire to restore two or more files that are physically contiguous on the backup tape. In many cases the file boundary for the two files occurs in the middle of a tape block. When restoring the first file, current implementations of DAR will perform a read operation of the entire block, reading to the end of the first file and then some portion of the beginning of the second file. However, when a read operation is performed to restore the second file, a seek operation must be performed to reposition the tape reading mechanism. The beginning of the second file is then read, despite having been previously read during the read operation performed while restoring the first file. Most modern tape drives are optimized in way that the seek operation can be very expensive. For example, for some digital linear tape (DLT) drives, a seek operation after a few reads to the previous block can take many seconds to finish. Consequently, the restore operation can take a long time.