1. Field of the Invention
The present invention relates to network storage systems and, more particularly, to takeover procedures in clustered storage systems.
2. Background Information
A storage system is a computer that provides storage service relating to the organization of information on writeable persistent storage devices, such as memories, tapes or disks. The storage system is commonly deployed within a storage area network (SAN) or a network attached storage (NAS) environment. When used within a NAS environment, the storage system may be embodied as a file server including an operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on, e.g. the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks, configured to store information, such as the actual data for the file. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored.
The file server, or filer, may be further configured to operate according to a client/server model of information delivery to thereby allow many client systems (clients) to access shared resources, such as files, stored on the filer. Sharing of files is a hallmark of a NAS system, which is enabled because of semantic level of access to files and file systems. Storage of information on a NAS system is typically deployed over a computer network comprising a geographically distributed collection of interconnected communication links, such as Ethernet, that allow clients to remotely access the information (files) on the file server. The clients typically communicate with the filer by exchanging discrete frames or packets of data according to pre-defined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP).
In the client/server model, the client may comprise an application executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet. NAS systems generally utilize file-based access protocols; therefore, each client may request the services of the filer by issuing file system protocol messages (in the form of packets) to the file system over the network. By supporting a plurality of file system protocols, such as the conventional Common Internet File System (CIFS), the Network File System (NFS) and the Direct Access File System (DAFS) protocols, the utility of the filer may be enhanced for networking clients.
It is advantageous for the services and data provided by a storage system, such as a storage system node, to be available for access to the greatest degree possible. Accordingly, some storage systems provide a plurality of storage system nodes organized as a cluster, with a first storage system node connected with a second storage system node. Each storage system node is configured to take over serving data access requests for the other storage system node if the other storage system node fails. The storage nodes in the cluster notify one another of continued operation using a heartbeat signal which is passed back and forth over a cluster interconnect, and over a cluster switching fabric. If one of the storage system nodes detects the absence of a heartbeat from the other storage node over both the cluster interconnect and the cluster switching fabric, a failure is detected and a takeover procedure is initiated. It is noted that the failure is also usually confirmed by the takeover node by checking a master mailbox disk of the other storage node to confirm that it is in fact a failure of the other storage node itself and not simply a failure of the cluster interconnect coupling.
When a node fails in the clustered environment, the need arises to transfer the ownership of a volume from the failing node to another node in order to provide clients with continuous access to the disks. Thus, in order to readily transfer ownership of the disks in order to perform a takeover, many cluster configurations include the concept of partnering. Specifically, each storage system node in the cluster is partnered with a second storage system node in such a manner that the partner storage system node is available to take over and provide the services and the data otherwise provided by the second storage system node. The partner assumes the tasks of processing and handling any data access requests normally processed by the second storage system node. It is further noted that in such storage system node clusters, an administrator may desire to take one of the storage system nodes offline for a variety of reasons including, for example, to upgrade hardware, etc. In such situations, it may be advantageous to perform a “voluntary” user-initiated takeover operation, as opposed to a failover operation. After the takeover operation is complete, the storage system node's data is serviced by its partner until a giveback operation is performed.
It is also noted that, in some storage system architectures, each node in the cluster is generally organized as a network element (N-module) and a disk element (D-module). The N-module includes functionality that enables the node to connect to clients over a computer network, while each D-module connects to one or more storage devices such as the disks of a disk array. A file system architecture of this type is generally described in United States Patent Application Publication No. US 2002/0116593 entitled METHOD AND SYSTEM FOR RESPONDING TO FILE SYSTEM REQUESTS, by M. Kazar et al. (the contents of which are incorporated herein by reference in entirety).
In some configurations, an N-module may be associated with multiple D-modules. If one of such D-modules fails, then any of the other takeover D-modules may perform a takeover and begin to serve data access requests for the failed D-module. In such configurations, it may be desirable to deliver to the N-module a single view of the storage pool that a particular D-module serves, rather than exposing two sets of storage pools to the N-module (i.e., a local image of the disks being served by the takeover D-module, and a set of partner disks).
In either the partner system or the other systems, when a node detects a failure or panic, a takeover procedure is invoked. Further details regarding takeover procedures are provided in commonly owned U.S. patent application Ser. No. 10/764,809 filed on Jan. 26, 2004 by Coatney et al., for a SYSTEM AND METHOD FOR TAKEOVER OF PARTNER RESOURCES IN CONJUNCTION WITH COREDUMP, and in U.S. patent application Ser. No. 10/764,773 filed on Jan. 26, 2004, of Cassell et al., for a SYSTEM AND METHOD OF SELECTION AND COMMUNICATION OF A DISK FOR STORAGE OF A COREDUMP, both of which are incorporated herein by reference in entirety.
The nodes are configured such that when the node detects that it is failing, it saves substantially the entire contents of its memory to a spare disk. This procedure is sometimes referred to as a coredump, and the disk to which the contents are saved is referred to herein as the “coredump” disk. In the case of a clustered environment, where more than one node may be able to take control of a given disk set via ownership reservations, the coredump is only directed to owned disks of the failed node. The coredump disk is not otherwise accessible to the takeover node to begin the takeover process. Rather, the coredump disk remains occupied with the actions of the failed node in writing of the coredump. As the coredump disk must, typically, be accessed by the takeover node as part of a conventional takeover operation, the takeover node consequently delays the overall takeover process until the failing node completes its coredump. In effect, the takeover process proceeds through two sequential steps: first coredump by the failing node is completed, and then takeover by the takeover node occurs. While the two steps (coredump and takeover) proceed, the failure may actually turn from “soft” to “hard,” with the failing node becoming completely inaccessible. This can occur before the takeover process is fully completed. In addition, during this delay, data handled by the failing node is inaccessible to clients, and is not made available again until takeover is complete. It is highly desirable to reduce unavailability of data from a cluster to the greatest extent possible, particularly in a block-based (SAN) environment in which clients are highly vulnerable to data unavailability. For example, if a file server does not respond within a set period of time, the SAN protocol may issue a network-wide panic, which may, in turn, lead to a total network shutdown.
In addition, prior to the coredump procedure, the failing node needs to reset its storage adapters. The storage adapters need to be reset because either the disks or the adapter itself might be in an error state, which would prevent the writing of the coredump information. Reset returns these devices to a known working state. The reset process interrupts all I/O operations to the disks that are attached to the adapter card, and then requires an initialization including an identification and handshake with each device. This whole process can take up to one minute or more.
Once this is performed, the failing node identifies a spare disk for the coredump, then updates a coredump header in an appropriate location in a data structure, e.g., a RAID label of the identified spare disk indicating that this disk has been designated as the coredump disk and that a coredump procedure is occurring.
Thereafter, the takeover node only then starts the takeover procedure by locating the disk(s) used for the coredump procedure and reading the coredump header on each disk label in a search for the disk that has a coredump header that was updated by the failing node.
In many storage configurations, the adapter reset process and the coredump disk identification process can take an excessively long time, i.e., up to one minute or more. In such a case, the takeover may be delayed by this latency, which can lead to a soft failure becoming a hard failure, as noted, and which may have a significant impact on client-data access requests and can in some cases result in the takeover being aborted, and a potential network shutdown.
There remains a need, therefore, for system and method which allows for ready identification of a coredump disk and simultaneous takeover without waiting for time-consuming tasks to be completed prior to the initiation of the takeover in a multiple node cluster.