1. Field of the Invention
The present invention relates to networked storage systems and, more particularly, to procedures for maintaining disk location information within such systems.
2. Background Inform
A storage system is a computer that provides storage service relating to the organization of information on writeable persistent storage devices, such as memories, tapes or disks. The storage system is commonly deployed within a storage area network (SAN) or a network attached storage (NAS) environment. When used within a NAS environment, the storage system may be embodied as a file server including an operating system that implements a file system to logically organize the information as a hierarchical structure of data containers, such as directories and files on, e.g. the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks, configured to store information, such as the actual data for the file. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored.
The file server, or filer, may be further configured to operate according to a client/server model of information delivery to thereby allow many client systems (clients) to access shared resources, such as files, stored on the filer. Sharing of files is a hallmark of a NAS system, which is enabled because of semantic level of access to files and file systems. Storage of information on a NAS system is typically deployed over a computer network comprising a geographically distributed collection of interconnected communication links, such as Ethernet, that allow clients to remotely access the information (files) on the file server. The clients typically communicate with the filer by exchanging discrete frames or packets of data according to pre-defined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP).
In the client/server model, the client may comprise an application executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet. NAS systems generally utilize file-based access protocols; therefore, each client may request the services of the filer by issuing file system protocol messages (in the form of packets) to the file system over the network. By supporting a plurality of file system protocols, such as the conventional Common Internet File System (CIFS), the Network File System (NFS) and the Direct Access File System (DAFS) protocols, the utility of the filer may be enhanced for networking clients.
A SAN is a high-speed network that enables establishment of direct connections between a storage system and its storage devices. The SAN may thus be viewed as an extension to a storage bus and, as such, an operating system of the storage system enables access to stored information using block-based access protocols over the “extended bus”. In this context, the extended bus is typically embodied as Fibre Channel (FC) or Ethernet media adapted to operate with block access protocols, such as Small Computer Systems Interface (SCSI) protocol encapsulation over FC (FCP) or TCP/IP/Ethernet (iSCSI). A SAN arrangement or deployment allows decoupling of storage from the storage system, such as an application server, and some level of storage sharing at the application server level. There are, however, environments wherein a SAN is dedicated to a single server. When used within a SAN environment, the storage system may be embodied as a storage appliance that manages access to information in terms of block addressing on disks using, e.g., a logical unit number (LUN) in accordance with one or more block-based protocols, such as FCP.
One example of a SAN arrangement, including a multi-protocol storage appliance suitable for use in the SAN, is described in United States Patent Application Publication No. US2004/0030668 A1, filed on Feb. 14, 2004, entitled MULTI-PROTOCOL STORAGE APPLIANCE THAT PROVIDES INTEGRATED SUPPORT FOR FILE AND BLOCK ACCESS PROTOCOLS by Brian Pawlowski et al., which is incorporated herein by reference in its entirety.
It is advantageous for the services and data provided by a storage system, such as a storage node, to be available for access to the greatest degree possible. Accordingly, some storage systems provide a plurality of storage system nodes organized as a cluster, with a first storage system node coupled to and cooperating with a second storage system node. Each storage system node is configured to takeover serving data access requests for the other storage system node if the other node fails. The storage nodes in the cluster notify one another of continued operation using a heartbeat signal exchanged over a cluster interconnect and a cluster switching fabric. If one of the storage system nodes detects the absence of a heartbeat from the other storage node over both the cluster interconnect and the cluster switching fabric, a failure of the other node is assumed and a takeover procedure is initiated. The node failure is also usually confirmed by the surviving storage node using a mailbox mechanism of the other storage node to confirm that, in fact, a failure of the other storage node has occurred, rather than simply a failure of the cluster node coupling.
Specifically, the mailbox mechanism includes a set of procedures for determining the most up-to-date coordinating information through the use of one or more “master mailbox” disks. Such disks receive messages from the storage node with which they are associated in order to confirm that the node continues to be in communication with the disks and that the node continues to be capable of writing to other disks coupled to that node. Further details on the configuration and operation of the master mailbox disk are provided in commonly-owned U.S. Pat. No. 7,231,489, of Larson et al., for a SYSTEM AND METHOD FOR COORDINATING CLUSTER STATE INFORMATION, issued on Jun. 12, 2007, which is presently incorporated by reference herein in its entirety.
Many such cluster configurations that have a plurality of storage system nodes, operate using the concept of partnering (i.e., “partner mode”). Specifically, each storage system node in the cluster is partnered with a second storage system node in such a manner that the partner storage system node is available to take over and provide the services and the data otherwise provided by the second storage system node upon a failure of the second node. That is, upon such a failure, the partner assumes the tasks of processing and handling any data access requests normally processed by the second storage system node. One such example of a partnered storage system cluster configuration is described in U.S. Pat. No. 7,260,737, entitled SYSTEM AND METHOD FOR TRANSPORT-LEVEL FAILOVER OF FCP DEVICES IN A CLUSTER, by Arthur F. Lent, et al., issued on Aug. 21, 2007, the contents of which are hereby incorporated by reference. It is further noted that in such storage system node clusters, an administrator may desire to take one of the storage system nodes offline for a variety of reasons including, for example, to upgrade hardware, etc. In such situations, it may be advantageous to perform a “voluntary” user-initiated takeover operation, as opposed to a failover operation. After the takeover operation is complete, the storage system node's data is serviced by its partner until a giveback operation is performed.
In such cases employing a partner mode, additional infrastructure is often required. For example, requests are tracked to determine whether they are partner requests, and applicable data structures are duplicated. Separate data structures or tables describing the data, such as for example, a volume location database (VLDB) are maintained for the local disks and for the partner disks. In addition, registry files which store options and configuration parameters are also maintained separately in a local registry file and a partner registry file. As will be apparent to those skilled in the art, this results in additional code complexity in many systems. Moreover, if a partner mode is not used, it could be difficult for the takeover node, or for an administrator, to determine disks which have been assigned to a failed partner if the partner's ownership information is not available.
In some storage system architectures, each storage node in the cluster is generally organized as a network element (N-module) and a disk element (D-module). The N-module includes functionality that enables the node to connect to clients over a computer network while each D-module connects to one or more storage devices, such as the disks. The disks are arranged as one or more aggregates containing one or more volumes. A file system architecture of this type is generally described in U.S. Pat. No. 6,671,773 entitled METHOD AND SYSTEM FOR RESPONDING TO FILE SYSTEM REQUESTS, by M. Kazar et al., issued on Dec. 30, 2003 (the contents of which are incorporated herein by reference in entirety).
Extensions to such architectures include the assignment of certain functionality to the D-module that was previously performed by the N-module. For example, the N-module is generally responsible for network connectivity, while the D-module performs functions relating to data containers and data access requests to those containers. In some designs, the N and D-module pairs are partnered in such a manner that, during a failover, the surviving N-module and D-module take over network addresses and perform other administrative tasks for the failed N and D-modules. However, in a cluster that does not have a one-to-one pairing between N and D-modules, and may have multiple nodes in a cluster, there is not a readily available technique for identifying the resources, e.g., disks, that are to be taken over by one or more nodes, and subsequently returned to the previously failed node that has been brought back into service. Some ownership information is stored on-disk such as that described in commonly owned U.S. patent application Ser. No. 10/027,457 of Coatney et al., now published as U.S. Patent Application Publication No. US 2003/0120743 on Jun. 26, 2003, which is presently incorporated herein by reference. This on disk information, however, is not generally utilized for disk reassignment on takeovers, send homes and disk topology reconfigurations.
There remains a need, therefore, for a multi-node cluster system that is configured to provide ownership information about the disks served by the nodes in the cluster so that any of the D-modules in the cluster can locate resources served by the cluster, and further such that all or a portion of those resources can be assigned or reassigned to any other D-module in the cluster, or to more than one D-module in the cluster.