This invention relates to network storage systems, and more particularly to data storage systems including file servers for managing a number of attached storage volumes.
A file server is a computer that provides file service relating to the organization of information on storage devices, such as disks. The file server or filer includes a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. Each xe2x80x9con-diskxe2x80x9d file may be implemented as a set of data structures, e.g., disk blocks, configured to store information. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored.
A filer may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server, e.g., the filer. In this model, the client may comprise an application, such as a database application, executing on a computer that xe2x80x9cconnectsxe2x80x9d to the filer over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the file system on the filer by issuing file system protocol messages (in the form of packets) to the filer over the network.
A common type of file system is a xe2x80x9cwrite in-placexe2x80x9d file system, an example of which is the conventional Berkeley fast file system. In a write in-place file system, the locations of the data structures, such as inodes and data blocks, on disk are typically fixed. An inode is a data structure used to store information, such as meta-data, about a file, whereas the data blocks are structures used to store the actual data for the file. The information contained in an inode may include, e.g., ownership of the file, access permission for the file, size of the file, file type and references to locations on disk of the data blocks for the file. The references to the locations of the file data are provided by pointers, which may further reference indirect blocks that, in turn, reference the data blocks, depending upon the quantity of data in the file. Changes to the inodes and data blocks are made xe2x80x9cin-placexe2x80x9d in accordance with the write in-place file system. If an update to a file extends the quantity of data for the file, an additional data block is allocated and the appropriate inode is updated to reference that data block.
Another type of file system is a write-anywhere file system that does not over-write data on disks. If a data block on disk is retrieved (read) from disk into memory and xe2x80x9cdirtiedxe2x80x9d with new data, the data is stored (written) to a new location on disk to thereby optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. A particular example of a write-anywhere file system that is configured to operate on a filer is the Write Anywhere File Layout (WAFL(trademark)) file system available from Network Appliance, Inc. of Sunnyvale, Calif. The WAFL file system is implemented within a microkernel as part of the overall protocol stack of the filer and associated disk storage. This microkernel is supplied as part of Network Appliance""s Data ONTAP(trademark) software, residing on the filer, that processes file-service requests from network-attached clients.
As used herein, the term xe2x80x9cstorage operating systemxe2x80x9d generally refers to the computer-executable code operable on a storage system that implements file system semantics (such as the above-referenced WAFL) and manages data access. In this sense, ONTAP software is an example of such a storage operating system implemented as a microkernel. The storage operating system can also be implemented as an application program operating over a general-purpose operating system, such as UNIX(copyright) or Windows NT(copyright), or as a general-purpose operating system with configurable functionality, which is configured for storage applications as described herein.
Disk storage is typically implemented as one or more storage xe2x80x9cvolumesxe2x80x9d that comprise physical storage disks, defining an overall logical arrangement of storage space. Currently available filer implementations can serve a large number of discrete volumes (150 or more, for example). Each volume is associated with its own file system and, for purposes hereof, volume and file system shall generally be used synonymously. The disks within a volume are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Disks (RAID). RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data xe2x80x9cstripesxe2x80x9d across a given number of physical disks in the RAID group, and the appropriate caching of parity information with respect to the striped data. In the example of a WAFL file system, a RAID 4 implementation is advantageously employed. This implementation specifically entails the striping of data across a group of disks, and separate parity caching within a selected disk of the RAID group. As described herein, a volume typically comprises at least one data disk and one associated parity disk (or possibly data/parity) partitions in a single disk) arranged according to a RAID 4, or equivalent high-reliability, implementation.
More than one filer can reside on a single network (LAN, WAN, etc.), for access by network-connected clients and servers. Where multiple filers are present on the network, each filer may be assigned responsibility for a certain set of volumes. The filers may be connected in a cluster using a separate physical interconnect or linking communication protocol that passes over the network (e.g. the LAN, etc.). In the event of a failure or shutdown of a given filer, its volume set can be reassigned to another filer in the cluster to maintain continuity of service. In the case of an unscheduled shutdown, various failover techniques are employed to preserve and restore file service, as described generally in commonly owned U.S. patent application Ser. No. 09/933,866 entitled GRACEFUL TAKEOVER IN A NODE CLUSTER by Naveen Bali et al, the teachings of which are expressly incorporated herein by reference. Such techniques involve (a) the planned and unplanned takeover of a filer""s volumes by a cluster partner filer upon filer shutdown; and (b) the giveback of the taken-over volumes to the original filer upon reinitialization of the downed-filer. A management station can also reside on the network, as a specialized client that includes storage management software used by a system administrator to manipulate and control the storage-handling by networked filers.
A filer can be made more reliable and stable in the event of a system shutdown or other unforeseen problem by employing a backup memory consisting of a non-volatile random access memory NVRAM as part of its architecture. An NVRAM is typically a large-volume solid-state memory array (RAM) having either a backup battery, or other built-in last-state-retention capabilities (e.g. a FLASH memory), that holds the last state of the memory in the event of any power loss to the array.
In a known implementation, as a client transaction request is completed by the storage operating system, that request is logged to the NVRAM as a journal entry. The NVRAM is loaded with requests until such time as a consistency point (CP) is reached. CPs occur at fixed time intervals, or when other key events arise. Each time a CP occurs, the requests logged in the NVRAM are subsequently overwritten once the results of the requests are written from the filer""s conventional RAM buffer cache to disk. This is because once a root inode is written from cache to the disk, then the logged data in the NVRAM is no longer needed, and it may be overwritten or otherwise cleared. Immediately thereafter, the NVRAM is available for the logging of new requests. The process continues as each CP occurs, at which time the entry count of the NVRAM log is reset (allowing overwrite), and cached results of client requests are transferred to disk. In general, the NVRAM log is replayed to re-perform any requests logged therein for its own filer (and an associated cluster partner filer, if any) between the last CP and an interruption in storage handling. In addition, the log is replayed during reboot.
In the event of an unexpected shutdown, power failure or other system problem, which interrupts the normal flow of information among the client, storage operating system, and the disks, the NVRAM can be used to recover information logged since the last CP prior to the interruption event
Having discussed typical filer components and operation, we now turn to the transitioning of volumes. The physical disks of a volume may need to be taken offline (i.e., xe2x80x9cunmountedxe2x80x9d), in which state the associated files system is not allowed to access data on the volume""s disks. Subsequently, the disks may be brought back online (i.e., xe2x80x9cmountedxe2x80x9d) with that filer, or another differing filer, in which state the associated file system is again allowed to access the data on the disks. This process of mounting and/or unmounting the volume can occur for a variety of reasons. For example, when the filer is serviced, an unmount/mount may occur. Servicing of a particular disk within the overall volume disk array may require unmounting. Moreover, unmounting is involved whenever a filer (or the overall filer cluster) is gracefully shut down or rebooted. Mounting is also a required process in the above-described failover scenario, whereby the storage previously managed by a failed or otherwise unavailable filer is subsequently remounted in association with the cluster partner filer, while unmounting is required as part of giveback process when the storage taken over by a cluster partner is unmounted so that it can be made available to the filer normally responsible for managing it.
In general, prior art processes for transitioning of volumes between mounted and unmounted states involve the use of a specific control thread that does not make information regarding the mounted and unmounted states available (or recorded) externally to that control thread. In other words, other procedures in the overlying file system may not have knowledge of the particular stage of mount or unmount currently existing with respect to the transitioning volume because the mount/unmount control thread is isolated from other control threads within the file system.
One commonly employed technique to facilitate a graceful mount and unmount of a volume entails the taking-offline and subsequent reboot of the attached filer. However, as the size of attached storage volumes increases radically for a given filer, mounting or unmounting a volume by completely rebooting a filer becomes highly undesirable and leads to potential unavailability of an extensive quantity of data. Additionally, the growth in the number of volumes attached to a given filer makes the problem of isolated control threads of greater concern. This is because various file system control threads may need to be invoked during various stages of the mounting/unmounting process (such as filer failover procedures). The file system needs to xe2x80x9cknowxe2x80x9d when to invoke such system procedures/requests. In the past, flag bits in predetermined control structures were set to indicate the volume mounting and unmounting states. However, this approach is cumbersome, and is susceptible to errors when new or different control threads are introduced (e.g. upgrades and repairs)xe2x80x94and for which no flag bits have been provided by the file system procedures.
In one example, procedures that operate continuously and in background, such as the daemon for elimination of so-called xe2x80x9czombiexe2x80x9d files (e.g. large files that exist on-disk), but that are no longer needed, must be terminated or they will continue to interact with disks. This interaction may prevent unmount, as requests to the unmounting volume never cease. The delete operation is described in co-pending and commonly-assigned U.S. patent application Ser. No. 09/642,066, entitled MANIPULATION OF ZOMBIE FILES AND EVIL-TWIN FILES, by Ray Chen et al., which application is hereby incorporated by reference. In addition, clients/users connected through the Common Internet File System (CIFS) protocol require notification to cleanly break their logical connection with the filer, and thus to cease the continuous provision of requests to the file system for is the unmounting volume.
Likewise certain xe2x80x9cexogenousxe2x80x9d file system procedures/requests may be harmful during particular stages of the mount/unmount process, and need to be avoided. In order to ensure that a volume may mount or unmount xe2x80x9cgracefully,xe2x80x9d and without system failure, the progressive control of file system requests during unmounting and mounting and is desired.
It is, therefore, an object of this invention to provide a system and method within the overall file system architecture for ensuring that volume mounting and unmounting occurs gracefully, with desired file system requests being allowed, and with undesired exogenous requests being restricted, during mount and unmount. This system and method should also allow file system procedures to be readily added, deleted or modified in relative isolation from the mounting and unmounting procedure, while still ensuring that mounting and unmounting always occurs gracefully.
This invention overcomes the disadvantages of the prior art by providing a system and method for mounting and unmounting volumes attached to, managed by, or part of, a storage system, such as a file server, by tracking (with an appropriate mechanism in the storage operating system) specific sub-states within each of the overall mounting and unmounting procedure states, in which specific file system operations (such as requests) are selectively permitted or restricted (as appropriate) with respect to the mounting/unmounting volume based upon the sub-state.
More specifically, for mounting or unmounting, the storage operating system transitions a volume through a series of sub-states, as tracked in appropriate control structures. Each sub-state is characterized by a set of permitted operations that may be performed while the volume is in that sub-state, entrance and exit criteria for that sub-state, and restricted operations that may not be performed while the volume is that sub-state. During transaction request handling, the storage operating system validates each request against the sub-state to determine the disposition of the request. Then, depending on the request and the sub-state, the storage operating system will either execute the request, ignore the request, hold the request for later execution, or return an error message to the client originating the request. Typically, the restrictions on requests for mounting become less as the volume nears a mounted state, and the reverse is true as the volume nears an unmounted state. Thus, these sub-states regulate activities such as request processing to the volumes and the draining of remaining unfinished processes as well as the shutdown or isolation of the volume from long-term processes such as defragmentation and other file maintenance utilities that act upon the volume""s files.
According to a preferred embodiment, the storage operating system encodes the sub-states in control blocks or structures associated with the file system. Each volume is associated with an overall state, including a VOLUME MOUNTING state and a VOLUME UNMOUNTING state, which each define, respectively, progress toward final mount or unmount. The VOLUME MOUNTING state has specific sub-states at which various file system activities are permitted or rejected with respect to the subject volume, and the VOLUME UNMOUNTING state has other specific sub-states at which certain file system activities are permitted or rejected with respect to the subject volume.
For the VOLUME MOUNTING overall state, the sub-states include, in order of progression: (a) LOADING, in which generally only file system requests devoted to reading meta-data from the volume are permitted and other exogenous requests are rejected; (b) INITIALIZING, in which internal files needed for the mounting procedure, but not part of the volume meta-data, are initialized, and other requests are rejected; and (c) FINAL, in which initialization of the file system with respect to the mounted volumes has been completed, but the volume may not be ready to respond to external file system requests, due to the needed replay of an NVRAM log. In one embodiment, volumes may be directed to mount individually using a xe2x80x9cvolume onlinexe2x80x9d command by an operator, or as a group, based upon a reboot of a filer or takeover of volumes owned by an interconnected cluster partner filer.
For the VOLUME UNMOUNTING overall state, the sub-states include, in order of progression: (a) PREPARE, in which all file system requests are handled normally, but no changes to the volume data can be made other than those required to unmount the volume; (b) REFERENCE DRAIN (for unmounting individual volumes) or GIVEBACK DRAIN (for giveback from a cluster partner filer to a volume owner-filer), in which the external procedures/requests referencing the volume (xe2x80x9cvolume referencesxe2x80x9d) are allowed to complete, while no new volume references or other requests for the volume are permitted; (c) DRAIN, in which all active file system requests (possibly restarted after suspension) are processed to completion. No new file system requests other than certain specified native requests (such as, for example, CP-related requests) are permitted (e.g. specifically no exogenous requests); (d) WAIT, in which internal, long term processes, such as Zombie processing are terminated and new file system requests and most internal requests are rejected; and (e) FINAL, in which data structures storing the file system state are released. The file system may, according to one embodiment, perform a plurality of consistency point operations to clear any remaining requests related to the unmounted volumes from NVRAM.