A network storage appliance is a special-purpose computer that provides file service relating to the organization of information on storage devices, such as disks. The network storage appliance or filer includes an operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. Each “on-disk” file may be implemented as set of data structures, e.g., disk blocks, configured to store information. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored.
A filer may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server, e.g., the filer. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the file system on the filer by issuing file system protocol messages (in the form of packets) to the filer over the network.
A common type of file system is a “write in-place” file system, an example of which is the conventional Berkeley fast file system. In a write in-place file system, the locations of the data structures, such as inodes and data blocks, on disk are typically fixed. An inode is a data structure used to store information, such as meta-data, about a file, whereas the data blocks are structures used to store the actual data for the file. The information contained in an inode may include, e.g., ownership of the file, access permission for the file, size of the file, file type and references to locations on disk of the data blocks for the file. The references to the locations of the file data are provided by pointers, which may further reference indirect blocks that, in turn, reference the data blocks, depending upon the quantity of data in the file. Changes to the inodes and data blocks are made “in-place” in accordance with the write in-place file system. If an update to a file extends the quantity of data for the file, an additional data block is allocated and the appropriate inode is updated to reference that data block.
Another type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block on disk is retrieved (read) from disk into memory and “dirtied” with new data, the data block is stored (written) to a new location on disk to thereby optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. A particular example of a write-anywhere file system that is configured to operate on a filer is the Write Anywhere File Layout (WAFL™) file system available from Network Appliance, Inc. of Sunnyvale, Calif. The WAFL file system is implemented as a microkernel within the overall protocol stack of the filer and associated disk storage. This microkernel is supplied as part of Network Appliance's Data ONTAP™ software, residing on the filer, that processes file-service requests from network-attached clients.
The disk storage is typically implemented as one or more storage “volumes” that comprise a cluster of physical storage disks, defining an overall logical arrangement of storage space. Currently available filer implementations can serve a large number of discrete volumes (150 or more, for example). Each volume is generally associated with its own file system (WAFL for example). The disks within a volume/file system are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Disks (RAID). RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate caching of parity information with respect to the striped data. In the example of a WAFL-based file system and process, a RAID 4 implementation is advantageously employed. This implementation specifically entails the striping of data across a group of disks, and separate parity caching within a selected disk of the RAID group.
The exemplary filer may be made more reliable and stable in the event of a system shutdown or other unforeseen problem by employing a backup memory consisting of a non-volatile random access memory NVRAM as part of its architecture. An NVRAM is typically a large-volume solid-state memory array (RAM) having either a back-up battery, or other built-in last-state-retention capabilities (e.g. a FLASH memory), that holds the last state of the memory in the event of any power loss to the array.
As a client transaction request is completed by WAFL, that request is logged to the NVRAM as a journal entry. Such entries for a given File can include, for example, “Create File,” “Write File Data,” “Open File,” etc. Widely accepted file system standards, such as Network File System (NFS), specify that a file server should not reply to a requesting client until the results of a given request are written out to stable storage. Note that the results of the request, including associated file meta-data that would likely be changed by the request are not logged to NVRAM in accordance with this arrangement. This reduces the required storage space for the NVRAM while retaining critical information for possible replay. By writing to NVRAM, this requirement is met, and a reply can be returned to the requesting client with respect to the transaction before the results of the request have been written to a disk. The NVRAM is loaded with requests until such time as a consistency point (CP) is reached. CPs occur at fixed time intervals, or when other key events arise. Each time a CP occurs, the requests logged in the NVRAM are subsequently overwritten (after NVRAM log's entry count is reset to zero), as the results of the requests are written from the filer's conventional RAM buffer cache to disk. This is because once a root inode is written from cache to the disk, then the logged data in the NVRAM is no longer needed, and it may be overwritten or otherwise cleared. Immediately thereafter, the NVRAM is reloaded with new requests. The process continues as each CP occurs, at which time the entry count of the NVRAM log is reset (allowing overwrite), and cached results of client requests are transferred to disk.
However, in the event of an unexpected shutdown, power failure or other system problem, which interrupts the normal flow of information between the client, WAFL and the disks, the NVRAM must be called upon to recover information logged between the last CP to the interruption event, and that information must be replayed to Data ONTAP/WAFL so as to reconstruct the last transactions before interruption. In general, the replay process occurs in seriatim, with each logged request replayed in turn (in the order it exists in the NVRAM log), until the log has been fully replayed. During this time normal filer processes are suspended and affected volumes are inaccessible.
The processing of each NVRAM log entry requires WAFL to complete multiple phases, characterized generally by “LOAD,” “LOCK,” “MODIFY,” and “RESIZE,” before logged data is finally written to disk (via the filer's buffer cache memory). Note that LOAD and MODIFY are required phases for every message. In particular, the LOAD phase requires loading of file system data (inodes) from the disk into filer memory, and consumes substantial computing resources/time. Thereafter, the LOCK (if applicable), MODIFY and RESIZE (if applicable) phases are entered in sequence. During the MODIFY phase, the subject file and associated meta-data are modified in filer memory.
The MODIFY phase must occur in the exact order, with respect to other NVRAM log entries, as it had before the interruption. This procedure contrasts directly with normal filer runtime in which LOAD transactions are overlapped as concurrent access to multiple disks on the write-anywhere disk volume set occurs. As such, a normal runtime operation, which might consume a tenth or hundredth of a second of time, may last tens or hundreds of seconds in replay. In addition, where the transparent failover feature of the Common Internet File System (CIFS) protocol is employed, a client time-out will occur if a server fails to respond within forty-five seconds. If the server is inaccessible for more than forty-five seconds, then a desired transparent failover cannot occur. Hence, forty-five seconds may become a hard time limit within which normal server transactions must be reactivated, and this makes rapid replay of the NVRAM log even more desirable.
It is, therefore, an object of this invention to provide a more efficient technique for replaying an NVRAM log following system interruption that reduces the overall processing time for logged transactions, and therefore, speeds the restart of normal filer operations after an interruption.