Network storage is a common technique for, making large amounts of data accessible to multiple users, archiving data and other purposes. In a network storage environment, a storage server makes data available to client (host) systems by presenting or exporting to the clients one or more logical containers of data. There are various forms of network storage, including network attached storage (NAS) and storage area network (SAN). In a NAS context, a storage server services file-level requests from clients, whereas in a SAN context a storage server services block-level requests. Some storage servers are capable of servicing both file-level requests and block-level requests.
Various protocols can be used to implement network storage. These include a number of “stateful” storage protocols that have gained market adoption and popularity. A “stateful” protocol is a protocol in which the server needs to maintain per-client session state information (hereinafter simply called “session state” or “state”) at least until the applicable session has ended. Session state may include the following information elements, for example: client network information (Internet Protocol (IP) addresses, Transmission Control Protocol (TCP) ports, client protocol version, protocol features available at client side), authentication information, open filesystems metadata, open files metadata, per-file locking/synchronization metadata (e.g., needed for maintaining consistency of file data while two different clients are sharing the file).
Prominent among the popular stateful storage protocols are the Common Internet File System (CIFS) protocol and Network File System (NFS) version 4 (NFSv4) protocol. CIFS is currently the most popular NAS protocol among Microsoft® Windows® based datacenters. Because the system needs to maintain per-client session state on the server, the amount of memory and complexity required to implement stateful protocols tends to be higher from the server standpoint than for stateless protocols.
Stateful protocols also suffer from a distinct disadvantage in the event of server failures. Since the session information is critical for the client to get undisrupted access to storage, in the event of a server failure the session information gets purged, leading to lack of availability of storage resources to the client. Once the server reboots, the client is forced to detect the reboot and re-establish the session anew. After the new session is established, the client has to perform initialization packet exchanges, which contributes to additional delays.
Most NAS storage protocols, including the stateful ones, are based on a request-response methodology. The client makes a request and expects a response within a specified amount of time. If the client does not receive a response within the specified time, it tries to retransmit for a limited number of times before it deems the server to be unavailable. If the server is able to respond within the timeout interval, the client is able to continue with the interaction without major changes.
In the context of server failures, the server needs to be able to restart and come to a state where it can respond to the client in a coherent manner. With current storage server system designs, a server failure (e.g., a hardware failure or software panic) leads to a reboot of the system. After the reboot, the hardware initialization, especially the assimilation of disks belonging to the system, takes an inordinate amount of time relative to the protocol timeout limit. The client likely will have terminated the server session, leading to disruption at the application level on the client.