A file server is a computer that provides file service relating to the organization of information on storage devices, such as disks. The file server or filer includes a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. Each “on-disk” file may be implemented as a set of disk blocks configured to store information, such as text, whereas the directory may be implemented as a specially-formatted file in which information about other files and directories are stored. A filer may be configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server, e.g., the filer. In this model, the client may comprise an application, such as a file system protocol, executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the filer by issuing file system protocol messages (in the form of packets) to the filer over the network.
A common type of file system is a “write in-place” file system, an example of which is the conventional Berkeley fast file system. By “file system” it is meant generally a structuring of data and metadata on a storage device, such as disks, which permits reading/writing of data on those disks. In a write in-place file system, the locations of the data structures, such as inodes and data blocks, on disk are typically fixed. An inode is a data structure used to store information, such as metadata, about a file, whereas the data blocks are structures used to store the actual data for the file. The information contained in an inode may include, e.g., ownership of the file, access permission for the file, size of the file, file type and references to locations on disk of the data blocks for the file. The references to the locations of the file data are provided by pointers, which may further reference indirect blocks that, in turn, reference the data blocks, depending upon the quantity of data in the file. Changes to the inodes and data blocks are made “in-place” in accordance with the write in-place file system. If an update to a file extends the quantity of data for the file, an additional data block is allocated and the appropriate inode is updated to reference that data block.
Another type of file system is a write-anywhere file system that does not over-write data on disks. If a data block on disk is retrieved (read) from disk into memory and “dirtied” with new data, the data block is stored (written) to a new location on disk to thereby optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. A particular example of a write-anywhere file system that is configured to operate on a filer is the Write Anywhere File Layout (WAFL™) file system available from Network Appliance, Inc. of Sunnyvale, Calif. The WAFL file system is implemented within a microkernel as part of the overall protocol stack of the filer and associated disk storage. This microkernel is supplied as part of Network Appliance's Data ONTAP™ software, residing on the filer, that processes file-service requests from network-attached clients.
As used herein, the term “storage operating system” generally refers to the computer-executable code operable on a computer that manages data access and may, in the case of a filer, implement file system semantics, such as the Data ONTAP™ storage operating system, implemented as a microkernel, and available from Network Appliance, Inc. of Sunnyvale, Calif., which implements a Write Anywhere File Layout (WAFL™) file system. The storage operating system can also be implemented as an application program operating over a general-purpose operating system, such as UNIX® or Windows NT®, or as a general-purpose operating system with configurable functionality, which is configured for storage applications as described herein.
An illustrative block diagram of an inode-based file system 100 is shown in FIG. 1. A file system information block 105 includes various metadata describing the file system. Linked to the file system information block 105 is a root inode 110 of the file system. The root inode 110 contains pointers inode file indirect blocks 115. These inode file indirect blocks 115 contain pointers to inode file direct blocks 120. Inode file direct blocks 120 point to inodes 122, which, in turn, contain pointers to indirect inodes 124. The indirect inodes 124 contain pointers to file data blocks 125(A-C). In the example of a WAFL-based file system, file data blocks 125(A-C) store 4 kilobytes (KB) of data.
Disk storage is typically implemented as one or more storage “volumes” that comprise physical storage disks, defining an overall logical arrangement of storage space. Currently available filer implementations can serve a large number of discrete volumes (150 or more, for example). Each volume is associated with its own file system and, for purposes hereof, volume and file system shall generally be used synonymously. The disks within a volume are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Disks (RAID). RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate caching of parity information with respect to the striped data. As described herein, a volume typically comprises at least one data disk and one associated parity disk (or possibly data/parity) partitions in a single disk) arranged according to a RAID 4, or equivalent high-reliability, implementation.
Known storage operating systems typically contain a program to check and repair an associated file system. Examples of such file system checking programs include the UNIX-based fsck program and the checkdisk command on Microsoft Windows®-based systems. These known file system checking programs typically execute while the file system being verified is offline. By “offline” it is meant that the file system is not available for data access by users of the file system.
An example of a known file system verification program is the WAFL Check program available from Network Appliance, Inc. of Sunnyvale, Calif. The WAFL Check program executes on file servers running Network Appliance's Data ONTAP storage operating system and checks and repairs file systems using the WAFL file system.
The WAFL Check program operates in two phases: an inode phase and a directory phase. In the inode phase, the file system verification program looks at each buffer tree associated with an inode. A “buffer tree” is a linked list of indirect and direct inode data blocks which, in turn, point to file data blocks on the disks that comprise the file system. The WAFL Check program moves down each buffer tree and verifies that all pointers are valid and that no cross links occur. By “cross link” it is meant that an inode or file data block has multiple pointers to it.
In the directory phase, the WAFL Check program verifies the directory structure stored within the file system. In the example of the WAFL Check program, the checking process first goes through all of the file inodes of the file system and then through all of the directories of the file system. After making these two passes through the file system, and correcting any noted errors in the file system, the checking program completes.
In known examples, file system verification programs are executed when a user believes that there is a problem with the file system. For example, a crash or other file system failure may have occurred. A noted disadvantage of known file system checking programs is the substantial amount of time required to perform the file system check. As the file system is off-line and unavailable for data access by users, this checking time results in a prolonged period in which the data stored in a file system is unavailable to users.