A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage (NAS) environment, a storage area network (SAN), and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
The storage operating system of the storage system may implement a high-level module, such as a file system, to logically organize the information stored on volumes as a hierarchical structure of data containers, such as files and logical units. For example, each “on-disk” file may be implemented as set of data structures, i.e., disk blocks, configured to store information, such as the actual data for the file. These data blocks are organized within a volume block number (vbn) space that is maintained by the file system. The file system may also assign each data block in the file a corresponding “file offset” or file block number (fbn). The file system typically assigns sequences of fbns on a per-file basis, whereas vbns are assigned over a larger volume address space. The file system organizes the data blocks within the vbn space as a “logical volume”; each logical volume may be, although is not necessarily, associated with its own file system.
A known type of file system is a write-anywhere file system that does not over-write data on disks. If a data block is retrieved (read) from disk into a memory of the storage system and “dirtied” (i.e., updated or modified) with new data, the data block is is thereafter stored (written) to a new location on disk to optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. An example of a write-anywhere file system that is configured to operate on a storage system is the Write Anywhere File Layout (WAFL™) file system available from Network Appliance, Inc., Sunnyvale, Calif.
The storage system may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access data containers, such as files and logical units, stored on the system. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the storage system by issuing file-based and block-based protocol messages (in the form of packets) to the system over the network.
The use of magnetic tape or other backup storage device to store a sequential backup of a data (files, directories, etc.) from a storage system has been popular for many decades, such as where a client requests that the storage system back up the client's data. According to a typical tape backup, data is stored on tape media in association with a “backup index” that points to the location on the tape or another image structure where the particular directory, file, etc. resides. The index allows the client/user to ascertain the nature of the data stored and to retrieve it when needed for restoration to the storage system (for, example to an active in-core or on-disk file structure). The index traditionally takes the form of a table of contents of the various files and directories stored as backups serially on the tape or other media.
In particular, backup applications (threads) of the storage system generate a catalog of data/files that is backed up to a backup storage device (e.g., a tape device) as the data is backed up. This catalog, or “backup history” (or “file history”) for each set of data backed up (e.g., a certain amount of data or certain file/files, etc.) is transmitted to is the client requesting the backup, which may use the backup history to create the backup index mentioned above. Upon receipt of the backup history, the client may acknowledge receipt of the backup history to the backup application, accordingly. Notably, the backup operation, backup history generation, and the posting (transmission) of the backup history to the client generally occur in lockstep; that is, one step occurs at a time. In other words, the backup thread backs up the data, generates backup history for the data, and sends the backup history to the client.
One problem associated with this lockstep operation, however, is that the backup thread typically waits for a response from the client that acknowledges the backup history prior to backing up a next set of data. Accordingly, any transmission delays and client processing delays for which the backup thread waits is idle time of the backup thread. Thus, the transmission of (and subsequent waiting for a response to) backup history is a significant performance bottleneck. In addition, in the event multiple backup threads are operating on a storage system, e.g., for multiple backup applications/threads occurring in parallel, the delays associated with transmitting backup history may be substantially increased. Specifically, locks may be taken out by the backup threads that prevent other backup threads from operating, i.e., making other backup threads wait while one backup thread waits for a response from its client. There remains a need, therefore, for an efficient technique to perform backup operations, e.g., for a plurality of backup threads, and to communicate backup history to clients during the backup operations.