A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
The storage operating system of the storage system may implement a high-level module, such as a file system, to logically organize the information stored on volumes as a hierarchical structure of data containers, such as files and logical units (LUs). For example, each “on-disk” file may be implemented as set of data structures, i.e., disk blocks, configured to store information, such as the actual data for the file. These data blocks are organized within a volume block number (vbn) space that is maintained by the file system. The file system may also assign each data block in the file a corresponding “file offset” or file block number (fbn). The file system typically assigns sequences of fbns on a per-file basis, whereas vbns are assigned over a larger volume address space. The file system organizes the data blocks within the vbn space as a “logical volume”; each logical volume may be, although is not necessarily, associated with its own file system.
A known type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block is retrieved (read) from disk into a memory of the storage system and “dirtied” (i.e., updated or modified) with new data, the data block is thereafter stored (written) to a new location on disk to optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. An example of a write-anywhere file system that is configured to operate on a storage system is the Write Anywhere File Layout (WAFL®) file system available from NetApp, Inc. Sunnyvale, Calif. The disk arrays can include, for example, all traditional hard drives, flash drives, or a combination of hard drives and flash drives.
The storage system may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access data containers stored on the system. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the storage system by issuing access requests (read/write requests) as file-based and block-based protocol messages (in the form of packets) to the system over the network.
A plurality of storage systems may be interconnected to provide a storage system architecture configured to service many clients. In some embodiments, the storage system architecture provides one or more aggregates and one or more volumes distributed across a plurality of nodes interconnected as a cluster. The aggregates may be configured to contain one or more volumes. The volumes may be configured to store content of data containers, such as files and logical units, served by the cluster in response to multi-protocol data access requests issued by clients. Each node of the cluster includes (i) a storage server (also referred to as a “disk element”) adapted to service a particular aggregate or volume and (ii) a multi-protocol engine (also referred to as a “network element”) adapted to redirect the data access requests to any storage server of the cluster.
In the illustrative embodiment, the storage server of each node is embodied as a disk element and the multi-protocol engine is embodied as a network element. The network element receives a multi-protocol data access request from a client, converts that access request into a cluster fabric (CF) message and redirects the message to an appropriate disk element of the cluster. In some embodiments, the disk element and network element of a node comprise software components that are serviced (e.g., upgraded, re-installed, perform maintenance, repaired, etc.) from time to time.
Typically, clients will connect with a node for data-access sessions with the node. During a data-access session with a node, a client may obtain a client identifier (ID) for connecting with the network element and one or more file handles to access files through the disk element. The client ID needs to be produced through a connection authentication procedure and each file handle needs to be produced through an access request validation procedure. The client then uses the client ID and file handles in subsequent access requests sent to the node. The node also stores session data comprising the client ID and file handles of each connected client, so it may recognize the client IDs and file handles sent in the access requests. If the node does not recognize the client ID and file handle in an access request, the node may deny processing of the access request.
Typically, servicing of the disk element and network element of a node requires the serviced node to be taken offline, thereby disconnecting any client data-access sessions with the serviced node. Conventionally, upon disconnect with the serviced node, the client will drop/delete the client ID and all file handles and the serviced node will close all open files accessed by the file handles. Upon reconnection with a failover partner node of the serviced node, the client ID needs to be reproduced through the connection authentication procedure and each file handle needs to be reproduced through an access request validation procedure. Thus, servicing of the disk element and network element of each node typically causes substantial disruption to client data-access sessions. As such, there is a need for a less disruptive way of servicing software components of nodes of a cluster.