1. Field of the Invention
The present invention relates to networked storage systems and, more particularly, to failover protection in clustered storage systems.
2. Background Information
A storage system is a computer that provides storage service relating to the organization of information on writeable persistent storage devices, such as memories, tapes or disks. The storage system is commonly deployed within a storage area network (SAN) or a network attached storage (NAS) environment. When used within a NAS environment, the storage system may be embodied as a file server including an operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on, e.g. the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks, configured to store information, such as the actual data for the file. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored.
The file server, or filer, may be further configured to operate according to a client/server model of information delivery to thereby allow many client systems (clients) to access shared resources, such as files, stored on the filer. Sharing of files is a hallmark of a NAS system, which is enabled because of semantic level of access to files and file systems. Storage of information on a NAS system is typically deployed over a computer network comprising a geographically distributed collection of interconnected communication links, such as Ethernet, that allow clients to remotely access the information (files) on the file server. The clients typically communicate with the filer by exchanging discrete is frames or packets of data according to pre-defined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP).
In the client/server model, the client may comprise an application executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet. NAS systems generally utilize file-based access protocols; therefore, each client may request the services of the filer by issuing file system protocol messages (in the form of packets) to the file system over the network. By supporting a plurality of file system protocols, such as the conventional Common Internet File System (CIFS), the Network File System (NFS) and the Direct Access File System (DAFS) protocols, the utility of the filer may be enhanced for networking clients.
A SAN is a high-speed network that enables establishment of direct connections between a storage system and its storage devices. The SAN may thus be viewed as an extension to a storage bus and, as such, an operating system of the storage system enables access to stored information using block-based access protocols over the “extended bus”. In this context, the extended bus is typically embodied as Fibre Channel (FC) or Ethernet media adapted to operate with block access protocols, such as Small Computer Systems Interface (SCSI) protocol encapsulation over FC (FCP) or TCP/IP/Ethernet (iSCSI). A SAN arrangement or deployment allows decoupling of storage from the storage system, such as an application server, and some level of storage sharing at the application server level. There are, however, environments wherein a SAN is dedicated to a single server. When used within a SAN environment, the storage system may be embodied as a storage appliance that manages access to information in terms of block addressing on disks using, e.g., a logical unit number (LUN) in accordance with one or more block-based protocols, such as FCP.
One example of a SAN arrangement, including a multi-protocol storage appliance suitable for use in the SAN, is described in United States Patent Application Publication No. US2004/0030668 A1, filed on Feb. 14, 2004, entitled MULTI-PROTOCOL STORAGE APPLIANCE THAT PROVIDES INTEGRATED SUPPORT FOR FILE AND BLOCK ACCESS PROTOCOLS by Brian Pawlowski et al.
It is advantageous for the services and data provided by a storage system, such as a storage node, to be available for access to the greatest degree possible. Accordingly, some storage systems provide a plurality of storage system nodes organized as a cluster, with a first storage system node coupled to and cooperating with a second storage system node. Each storage system node is configured to takeover serving data access requests for the other storage system node if the other node fails. The storage nodes in the cluster notify one another of continued operation using a heartbeat signal exchanged over a cluster interconnect and a cluster switching fabric. If one of the storage system nodes detects the absence of a heartbeat from the other storage node over both the cluster interconnect and the cluster switching fabric, a failure of the other node is assumed and a takeover procedure is initiated. The node failure is also usually confirmed by the surviving storage node using a mailbox mechanism of the other storage node to confirm that, in fact, a failure of the other storage node has occurred, rather than simply a failure of the cluster node coupling.
Specifically, the mailbox mechanism includes a set of procedures for determining the most up-to-date coordinating information through the use of one or more “master mailbox” disks. Such disks receive messages from the storage node with which they are associated in order to confirm that the node continues to be in communication with the disks and that the node continues to be capable of writing to other disk coupled to that node. Further details on the configuration and operation of the master mailbox disk are provided in commonly-owned U.S. patent application Ser. No. 10/378,400, of Larson et al., for a SYSTEM AND METHOD FOR COORDINATING CLUSTER STATE INFORMATION, filed on Mar. 3, 2003, which is presently incorporated by reference herein in its entirety.
In some storage system architectures, each storage node in the cluster is generally organized as a network element (N-module) and a disk element (D-module). The N-module includes functionality that enables the node to connect to clients over a computer network while each D-module connects to one or more storage devices, such as the disks. The disks are arranged as one or more aggregates containing one or more volumes. A file system architecture of this type is generally described in United States Patent Application Publication No. US 2002/0116593 entitled METHOD AND SYSTEM FOR RESPONDING TO FILE SYSTEM REQUESTS, by M. Kazar et al. (the contents of which are incorporated herein by reference in entirety).
Extensions to such architectures include the assignment of certain functionality to the D-module which may have previously been performed by the N-module. For example, the N-module is generally responsible for network connectivity, while the D-module performs functions relating to data containers and data access requests to those containers. In such configurations, it may be desirable to further configure the D-module such that it can perform a recovery procedure, including takeover and giveback operations, independent of the N-module.
Once the failed node has been either replaced or repaired in accordance with the recovery procedure, the failed node is typically brought back into service. Data containers such as disks and their associated volumes and/or aggregates, previously served by that failed node are “returned” to the now recovered node such that data access requests may once again be served by the recovered node. However, returning a full compliment of aggregates and volumes back to the recovered node has a fairly substantial processing performance impact because of the many tasks which are required to be performed during node recovery. For example, RAID assimilations for all of the aggregates are required to bring the aggregates online at once, so that they may be served by the recovered node. Yet, the aggregates are generally not available during performance of these tasks, which can result in noticeable downtime to clients, since service to data access requests is essentially disabled during the recovery procedure. Furthermore, if the recovered node does not reboot after the giveback operation, there may be additional downtime while the problem is detected and addressed.
There remains a need, therefore, for an improved method for giveback of data resources, such as aggregates, volumes and disks to a previously failed node after recovery of that node that does not have a significant adverse impact in terms of processing performance and noticeable downtime to clients.