As the number of computing devices increase across society, electronic data management has become increasingly challenging. Modern devices create and use ever increasing amounts of electronic data ranging from digital photos and videos, to large data sets related to any number of topics including energy exploration, human resources, seismic activity, and gene research. This explosion in digital data has naturally led to ever increasingly large amounts of data that must be stored. Correspondingly, the data storage field is under constant pressure to increase size, performance, accessibility, reliability, security, and efficiency of data storage systems.
In order to meet this demand for data storage, various storage systems have been developed. Large scale storage systems often include storage appliances that include arrays of spinning hard drives, magnetic tape drives, and solid state drives. Multiple storage appliances may be networked together to form a cluster. A cluster of storage appliances allows for adding capacity as well as added redundancy. Storage appliances in a cluster may be configured to mirror data so that if one of the storage appliances becomes inoperable for any reason, the data is still available at another location.
Referring to FIG. 1, a storage network 100 is depicted. This storage network 100 includes one or more storage appliances 110, 120 each including one or more disk drives. The storage network 100 is accessible by clients 130, 132, 134, 136 using a network 140. Generally speaking, the storage appliance (or appliances) manages the storage of data on disk drives. The depicted networks may be local in nature or geographically dispersed such as with large private enterprise networks or the Internet.
The storage appliances 110, 120 may include any conventional storage appliance such as a ZFS storage appliance. ZFS is a combined file system and volume manager designed by Sun Microsystems® in 2005 that allows for data integrity verification and repair, high storage capacities, along with numerous other features. ZFS based systems utilize storage pools (often referred to as zpools) constructed of virtual devices (often referred to as vdevs) constructed of block devices. A block device is any device that moves data in the form of blocks including hard disk drives and flash drives. A virtual device may span a number of block devices and a zpool may include one or more vdevs, each including one or more partitions of hard drives or one or more hard drives.
Traffic to and from the storage appliances 110, 120 is typically managed by the one or more dedicated storage servers located within the appliances. A common protocol used for managing these storage appliances 110, 120 is the network file system, commonly abbreviated “NFS.” NFS is a widely used distributed file system protocol, originally developed by Sun Microsystems in 1984, and currently in version 4 (NFSv4). NFS allows users at the clients 130-136 to access the stored data seamlessly by providing a programming interface found on the storage appliances 110, 120. The programming interface enables the creation and deletion of files, reading and writing of files, performing seeks within a file, creating and deleting directories, managing directory contents, and any other file operation. The operating system running on each of the clients 130-136 is configured to utilize the programming interface in order to manage the file system and to facilitate the interaction of executing applications with data residing in the storage appliances 110, 120.
In this example, the storage appliances 110, 120 are configured to operate using NFSv4. Generally, NFS systems are configured to separate the storage of file-system metadata and the files themselves. The metadata describes the location of the files on the storage appliances' disk drives that the clients 130-136 are attempting to access. NFS is a “stateful” protocol meaning the storage appliances 110, 120 each maintain a log of current operations being performed by the clients 130-136. This log is often referred to as “state table.”
Each storage appliance 110, 120 is aware of the pools that are being served by each storage appliance 110, 120. Each pool has a corresponding distributed stable storage (DSS) path where the storage server writes persistent data about each client 130-136 when the client first contacts the server. This data may be used to identify data owned by a client if the client becomes disconnected from the storage server or storage appliances 110, 120.
Two or more storage appliances 110, 120 may be connected to form a cluster. It is common to refer to each storage appliance 110, 120 as a “node.” Each of the nodes exports different resources, pools and interfaces to the clients 130-136. If one of the nodes in the cluster encounters a problem and is not longer capable of maintaining operations, the operations of the failing node (the storage server portion of the node) may failover or be taken over by one or more other nodes. In other words, during a failover or takeover, one of the storage appliances transfers its responsibilities for managing its various resources, pools, and interfaces to one of the other storage appliances. Generally speaking, the storage server of the other storage server takes over for the storage server, with the other storage server interacting with the clients and the storage of the storage server. A failover or takeover is generally triggered when one of the nodes reboots or panics. A failback is the opposite of a failover/takeover. When a failback occurs, a node has been brought back online and the pools and interfaces that were taken over by the peer node are transferred back to the node that originally was in charge of them. The ability to perform failovers/takeovers and failbacks is a feature of having multiple storage appliances 110, 120 arranged in a cluster, increasing the uptime of a system.
In order to perform unit and regression testing for changes to the software running on the storage appliances 110, 120, a cluster 100 may be required. For example, in order to test the performance of a failover/takeover and failback multiple storage appliances 110, 120 configured in a cluster 100 are necessary. One or more of the clients 130-136 may be configured as a testing workstation capable of connecting to the cluster 100, directly to one or more of the storage appliances 110, 120 in order to perform tests various tests of the system.
These clusters of storage appliances are very expensive, with each node costing in the thousands, tens of thousands, or hundreds of thousands of dollars. Since simply installing extra clusters for testing is cost prohibitive, securing time to unit and regression test features requiring a cluster can be difficult.
Several traditional virtualization solutions of storage appliances are available and work well, but nonetheless suffer from some drawbacks. Current commercial solutions tend to require large amounts of computing resources in order to operate and using them to simulate the special-purpose hardware of storage appliances is often not possible or very difficult, requiring the building of special purpose drivers and other time consuming customizations.
It is with these and other issues in mind that various aspects of the present disclosure were developed.