As the number of computing devices increase across society, electronic data management has become increasingly challenging. Modern devices create and use ever increasing amounts of electronic data ranging from digital photos and videos, to large data sets related to any number of topics including energy exploration, human resources, seismic activity, and gene research. This explosion in digital data has naturally led to ever increasingly large amounts of data that must be stored. Correspondingly, the data storage field is under constant pressure to increase size, performance, accessibility, reliability, security, and efficiency of data storage systems.
In order to meet this demand for data storage, various storage systems have been developed. Large scale storage systems often include storage appliances that include arrays of hard drives and other forms of memory and storage. Multiple storage appliances may be networked together to form a cluster. A cluster of storage appliances allows for adding capacity as well as adding redundancy. Storage appliances in a cluster may be configured to mirror data so that if one of the storage appliances becomes inoperable for any reason, the data is still available at another location.
Referring to FIG. 1, a storage network 100 is depicted. This storage network 100 includes one or more storage appliances 110, 120 each including one or more disk drives 112, 122. The storage network 100 is accessible by clients 130, 132, 134, 136 using a network 140. Generally speaking, the storage appliance (or appliances) manages the storage of data on the disk drives 112, 122. The depicted networks may be local in nature or geographically dispersed such as with large private enterprise networks or the Internet.
The storage appliances 110, 120 may include any conventional storage appliances such as ZFS storage appliances. ZFS is a combined file system and volume manager designed by Sun Microsystems® in 2005 that allows for data integrity verification and repair, high storage capacities, along with numerous other features. ZFS based systems utilize storage pools constructed of virtual devices (often referred to as vdevs) constructed of block devices, in this case the disk drives 112, 122. A block device is any device that moves data in the form of blocks. This includes hard disk drives, flash drives, and other addressable regions of memory. A virtual device may span a number of block devices and a pool may include one or more vdevs, each including one or more partitions of hard drives or one or more hard drives.
Traffic to and from the storage appliances 110, 120 is typically managed by the one or more dedicated storage servers located within the appliances. A common protocol employed on the storage appliances 110, 120 for accessing files, directories, and their associated metadata is the network file system, commonly abbreviated “NFS.” NFS is a widely used distributed file system protocol, originally developed by Sun Microsystems in 1984, and currently in version 4 (NFSv4). NFS allows users at the clients 130-136 to access the stored data seamlessly by providing a programming interface that enables the creation and deletion of files, reading and writing of files, performing seeks within a file, creating and deleting directories, managing directory contents, and other file operations. The operating system running on each of the clients 130-136 is configured to utilize the programming interface in order to manage the file system and manage the interaction between executing applications with data residing in the storage appliances 110, 120.
In this example, the storage appliances 110, 120 are configured to operate using NFSv4. Generally, NFS systems are configured to manage file-system metadata and provide access to files and directories. The metadata describes the location of the files on the storage appliances' disk drives 112, 122 that the clients 130-136 are attempting to access. NFS is a “statefull” protocol meaning the storage appliances 110, 120 each maintain a log of current operations being performed by the clients 130-136. This log is often referred to as “state table.”
Each storage appliance 110, 120 is aware of the pools that are being served by each storage appliance 110, 120. Each pool has a corresponding distributed stable storage (DSS) path where the storage server writes persistent data about each client 130-136 when the client first contacts the server. This data may be used to identify data owned by a client if the client becomes disconnected from the storage server or storage appliances 110, 120.
Users witness the statefulness of the system when a storage appliance 110, 120 reboots or undergoes a takeover, a failover, or a failback. A reboot, for example, involves the storage appliance's entire system shutting down using an orderly shutdown procedure. During the shutdown, all of the processes running on the storage appliance 110, 120 are discontinued. After the shutdown is complete, the appliance may or may not be power cycled, all necessary processes and applications may be restarted, and normal operation may be restarted.
A failover or takeover involves two or more storage appliances configured in a cluster. Each of the storage appliances 110, 120, often referred to as “nodes,” export different resources, pools and interfaces to the clients 130-136. During a failover or takeover, one of the storage appliances transfers its responsibilities for managing its various resources, pools, and interfaces to the other storage appliance and DNS paths are also copied over to the other storage appliance. A failover or takeover is generally triggered when one of the nodes reboots or panics. A failback is the opposite of a failover/takeover. When a failback occurs, a node has been brought back online and the pools and interfaces that were taken over by the peer node are transferred back to the node that originally was in charge of them. The ability to perform failovers/takeovers and failbacks is a feature of having multiple storage appliances 110, 120 arranged in a cluster, increasing the uptime of a system.
When a reboot, failover, takeover, or failback occurs, whatever action that was being performed on a storage appliance 110, 120 is stopped until the reboot, failover, takeover, failback or other event completes. Once the event completes, the last state of each client must be resent to the system by each client 130-136 to re-teach the system what it was doing before the event. If this state is not sent to the system, the system won't know what to do with the client. For example, if the client was downloading data and that download did not complete, the download wouldn't automatically restart. In order to facilitate the re-teaching of the system, a grace period is initiated to allow the clients 130-136 to resend their state data. During the grace period, the system prohibits any new requests for data to be made. To a client 130-136, the grace period causes whatever action the client was performing to stall or become non-responsive until the states have been restored and the grace period ends. The duration of the grace period is defined by the server with the most common value used for the grace period being 90 seconds. In a typically distributed storage system, typical events such as reboots, failovers, takeovers, and failbacks cause noticeable delays and disruptions to performance while the system goes through the grace period to restore the state tables and resume normal operations.
It is with these and other issues in mind that various aspects of the present disclosure were developed.