A storage server is a special-purpose processing system used to store and retrieve data on behalf of one or more client processing systems (“clients”) in a client/server model of information processing and distribution. A storage server can be used for many different purposes, such as to provide multiple users with access to shared data or to backup mission critical data.
A storage server may operate on behalf of one or more clients to store and manage and/or control shared files in a storage system, such as magnetic or optical storage based disks or tapes. In a large scale network, a storage server might be a dedicated network-attached storage (NAS) device that serves as a remote disk drive for other computers on the network. A storage server may include a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. Each on-disk file may be implemented as a set of data blocks configured to store information, such as text or image data, whereas the directories may be implemented as specially-formatted metadata files in which information about other files and directories is stored. Metadata is data about data. The purpose of metadata is to provide a consistent and reliable means of access to data. The metadata may be stored in a physical location or may be in a virtual database, in which metadata is drawn from separate sources. Metadata may include information about how to access specific data, or specific characteristics of the data, such as size, content or organization for example. Alternatively, the storage server may provide clients with block-level access to stored data (as opposed to file-level access), such as may be employed in a storage area network (SAN). A SAN is a network that that transfers data between computer systems and storage systems via peripheral channels such as SCSI (small computer system interface) or Fibre Channel.
In a client/server system, the client may be an application executing on a computer that communicates with the storage server over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the storage server by issuing file system protocol messages to the storage server over the network.
As used herein, the term “storage operating system” generally refers to the computer-executable code operable on a storage system that manages and/or controls data access and client access requests to storage servers. In this sense, the Data ONTAP™ operating system, available from Network Appliance, Inc., which implements a write anywhere file layout (WAFL™) file system, is an example of such a storage operating system. The operating system can also be implemented as an application program operating over a general-purpose operating system, such as UNIX® or Windows NT®, or as a general-purpose operating system with configurable functionality, which is configured for storage applications as described herein.
A storage server organizes the files on its attached storage system into one or more logical volumes that may span one or more physical storage devices, and “mounts” the logical volumes into the network filing system, defining an overall logical arrangement of storage space. Each volume is associated with its own file system and typically consists of a directory tree with a root directory, subdirectories and files. Mounting a volume makes the files in the volume accessible to network users without reference to a physical device. A volume is mounted by attaching its root directory to a location in a hierarchical network filing system, so that the directories of the mounted volume appear as subdirectories of the network file system. The network file system location where the volume is attached is called a mount point.
The disks within a volume are typically organized as one or more redundant arrays of independent (or inexpensive) disks (RAID). RAID implementations enhance the reliability and integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate caching of parity information with respect to the striped data. RAID implementations provide data integrity, i.e., the ability to recover from data write errors and other forms of data corruption. However, if the storage server associated with a RAID group goes offline, all of the volumes in the attached RAID group will be unavailable to the network. Clustered storage server systems (e.g., storage server systems employing the NetApp Cluster Failover application available from Network Appliance, Inc.) have been developed to address this data availability problem.
Clustering configures two or more storage servers as partners to stand in for each other in the event that one of the storage servers goes offline, a process known as failover. In a clustered storage server configuration, one storage server is able to take over the duties of another storage server (takeover phase) when the other storage server becomes unavailable, and transfer the duties back to the other storage server when it again becomes available (giveback phase). A storage server may be taken offline intentionally (e.g., for maintenance or upgrade) or may go offline unexpectedly due to a failure. Each storage server in a cluster provides information to the other storage server(s) about its own operational status, so another storage server in the cluster can take over from a storage server that goes offline.
Conventional network storage solutions are modeled on a quality of service (QoS) paradigm that attempts to guarantee system performance levels (e.g., input-output operations per second). In a failover situation, however, the QoS model breaks down because the network's activity load is distributed over a system with diminished resources (i.e., fewer storage servers). What is needed, therefore, is a technique for managing network service levels before, during and after failover that provides meaningful, adaptive controls based on the availability of system resources and context dependent needs of clients.