1. Field of the Invention
The present invention relates to storage systems and, more particularly, to backup and restore of information in such systems.
2. Background Information
A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
The storage operating system of the storage system may implement a high-level module, such as a file system, to logically organize the information stored on volumes as a hierarchical structure of data containers, such as files and logical units. For example, each “on-disk” file may be implemented as set of data structures, i.e., disk blocks, configured to store information, such as the actual data for the file. These data blocks are organized within a volume block number (vbn) space that is maintained by the file system. The file system may also assign each data block in the file a corresponding “file offset” or file block number (fbn). The file system typically assigns sequences of fbns on a per-file basis, whereas vbns are assigned over a larger volume address space. The file system organizes the data blocks within the vbn space as a “logical volume”; each logical volume may be, although is not necessarily, associated with its own file system.
The storage system may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access data containers stored on the system. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the storage system by issuing file-based and block-based protocol messages (in the form of packets) to the system over the network. In response, the storage system may return a data container handle for use by the client to access a data container served by the system.
A plurality of storage systems (nodes) may be interconnected as a cluster to provide a storage system environment configured to service many clients. Each storage system in the cluster may be configured to service one or more volumes, wherein each volume stores one or more data containers. Yet often a large number of data access requests issued by the clients may be directed to a small number of data containers serviced by a particular storage system of the cluster. A solution to such a problem is to distribute the volumes serviced by the particular storage system among all of the storage systems of the cluster. This, in turn, distributes the data access requests, along with the processing resources needed to service such requests, among all of the storage systems, thereby reducing the individual processing load on each storage system.
In addition to distributing the volumes served by a storage system among the storage systems of the cluster, an administrator may relocate the volumes or data containers stored on the volumes among any of the storage systems in the cluster. However, it is desirable to allow a client to still access, e.g., the relocated data container, using the data container handle. In order to ensure that relocation of the data container is transparent to the client, the administrator may employ a redirection identifier that indicates to the file system that the requested data container is not stored at the original storage location identified by the data container handle contained in the client access request.
An example of a redirection identifier is a junction that is associated with a storage location and that indicates that data is not stored at the originally-used location but is available at another storage location. Essentially, the junction provides a level of indirection between a storage system and a client accessing a data container served by the system. Junctions are described in further detail in commonly owned U.S. patent application Ser. No. 11/676,894 of Eisler et al., for a SYSTEM AND METHOD FOR ENABLING A DATA CONTAINER TO APPEAR IN A PLURALITY OF LOCATIONS IN A SUPER-NAMESPACE, which was filed on Feb. 20, 2007, (the contents of which are incorporated herein by reference in entirety).
Another example of a redirection identifier that may provide a level of indirection with respect to a data container served by a storage system is a symbolic link. A symbolic link (“symlink”) is a Unix® structure that, instead of representing a name of a data container, such as a file or directory on a Unix® platform, provides a path descriptor (such as a path name) to that data container. Symlinks are useful because of the flexibility they provide with respect to the locations of data containers on a storage system. In other words, a client can be informed that its data is provided at a location specified by a symlink and an administrator, when reconfiguring the location of that data may easily change the content (path descriptor) for that symlink.
A recovery feature provided by the clustered storage system is tape backup for data served by the cluster. Here, the tape is used to restore data that was lost due to a failure in the cluster. Alternatively, the tape backup information can be used on a second file system to provide a mirroring function for redundancy backup for volumes served by a first file system. Thus, information from the first file system may be retrieved and written onto a tape using a backup program (the backup process is sometimes referred to as a “dump”). Subsequently, the information can be read (“restored”) from the tape by a reader, and written onto disks associated with the second file system. The industry standard for tape based “dump and restores” are provided in the Network Data Management Protocol (NDMP), which is an open source control protocol for enterprise wide network based backup. The NDMP architecture allows network attached storage vendors to backup data of storage devices onto tape drives and tape libraries. The NDMP standard is set forth in an Internet Draft of the Network Working Group of the Internet Engineering Task Force (IETF), September 1997, of Hitz et al., (the contents of which are incorporated herein by reference in entirety).
The NDMP standard provides a messaging protocol for performing a backup operation using an NDMP client application which controls an NDMP server. The protocol includes a set of XDR-encoded messages that are exchanged over a bi-directional, e.g., TCP/IP connection and are used to control and monitor the state of the NDMP server and to collect detailed information about the data that is backed up. The storage system, which may be a Unix server, typically executes an NDMP server application. Data is backed up from the storage system to either a local tape drive or to a backup device on a remote storage system. The data is formatted into an image stream by a suitable program such the Berkeley Software Distribution (BSD) format, which is a standard format created as a derivative work by the University of California, as will be understood by those skilled in the art, known as Berkeley Software Distribution (BSD), which is also sometimes referred to as Berkeley Unix; it is a Unix derivative distributed by the University of California, starting in the 1970s. The name is also used collectively for various more recent descendants of such distributions.
During the backup, the NDMP server acts as a data server which reads data from disk, and generates an image stream in the specified backup format. More specifically, at the start of each tape, a “tape start” header is created and it is followed by one or more additional headers and data representing the directories from the lowest inode number to the highest, for example. These directories provide the names for the files that follow. After the directories, one or more headers and data representing the non-directory files such as regular files, symlinks, device files, and the like are recorded from lowest inode number to highest. At the end, one or more tape headers stating “tape end” is provided. During the restore, the data server reads the NDMP data stream from the tape and restores it back to a disk.
The industry standard NDMP protocol does not specifically provide for transfer of information related to junctions. In other words, it is not part of the known standard to back up and restore junction file type information. There has been no known way of handling junction file types in backup and restore processes in the standard. Thus, if a junction is encountered as part of the serialized data stream in a restore operation, the restore operation itself may fail because the junction information is not recognized. To rewrite code to place such functionality into the standard for recognition of the junction file type would be disadvantageous, because it may force third party vendors to purchase or rewrite software code to accommodate sending junction information during dump and restore activities. Accordingly, there remains a need for a method and system for backup and restore of junction information in a backup and restore operation.