A file server is a computer that provides file service relating to the organization of information on storage devices, such as disks. The file server or filer includes a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. Each “on-disk” file may be implemented as a set of disk blocks configured to store information, such as text, whereas the directory may be implemented as a specially-formatted file in which information about other files and directories are stored. A filer may be configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server, e.g., the filer. In this model, the client may comprise an application, such as a file system protocol, executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the filer by issuing file system protocol messages (in the form of packets) to the filer over the network.
A common type of file system is a “write in-place” file system, an example of which is the conventional Berkeley fast file system. In a write in-place file system, the locations of the data structures, such as inodes and data blocks, on disk are typically fixed. An inode is a data structure used to store information, such as meta-data, about a file, whereas the data blocks are structures used to store the actual data for the file. The information contained in an inode may include, e.g., ownership of the file, access permission for the file, size of the file, file type and references to locations on disk of the data blocks for the file. The references to the locations of the file data are provided by pointers, which may further reference indirect blocks that, in turn, reference the data blocks, depending upon the quantity of data in the file. Changes to the inodes and data blocks are made “in-place” in accordance with the write in-place file system. If an update to a file extends the quantity of data for the file, an additional data block is allocated and the appropriate inode is updated to reference that data block.
Another type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block on disk is retrieved (read) from disk into memory and “dirtied” with new data, the data block is stored (written) to a new location on disk to thereby optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. A particular example of a write-anywhere file system that is configured to operate on a filer is the Write Anywhere File Layout (WAFL™) file system available from Network Appliance, Inc. of Sunnyvale, Calif. The WAFL file system is implemented within a microkernel as part of the overall protocol stack of the filer and associated disk storage. This microkernel is supplied as part of Network Appliance's Data ONTAP™ storage operating system, residing on the filer, that processes file-service requests from network-attached clients.
As used herein, the term “storage operating system” generally refers to the computer-executable code operable on a storage system that manages data access and may, in the case of a filer, implement file system semantics, such as the Data ONTAP™ Storage operating system, implemented as a microkernal, and available from Network Appliance, Inc., of Sunnyvale, Calif., which implements Write Anywhere File Layout (WAFL™) file system. The storage operating system can also be implemented as an application program operating over a general-purpose operating system, such as UNIX® or Windows NT®, or as a general-purpose operating system with configurable functionality, which is configured for storage applications as described herein.
Disk storage is typically implemented as one or more storage “volumes” that comprise physical storage disks, defining an overall logical arrangement of storage space. Currently available filer implementations can serve a large number of discrete volumes (150 or more, for example). Each volume is associated with its own file system and, for purposes hereof, volume and file system shall generally be used synonymously. The disks within a volume are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Disks (RAID). RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate caching of parity information with respect to the striped data. In the example of a WAFL file system, a RAID 4 implementation is advantageously employed. This implementation specifically entails the striping of data across a group of disks, and separate parity caching within a selected disk of the RAID group. As described herein, a volume typically comprises at least one data disk and one associated parity disk (or possibly data/parity) partitions in a single disk) arranged according to a RAID 4, or equivalent high-reliability, implementation.
Disk drives may have only one size of sector on it. A disk sector (or block) is the basic storage unit of a disk drive. A disk drive is comprised of one of more platters of magnetic material. Each platter is further divided into a number of tracks. Each of these tracks is further divided into sectors. A sector is thus the smallest unit of a typical disk drive. Two common sizes of disk blocks or sectors are 512 bytes per sector (BPS) and 520 BPS.
Disk drives may sometimes prove unreliable in storing and/or returning data. Disk drives may issue spurious confirmations that an input/output (I/O) operation occurred when the operation did not occur, or that it occurred, but with incorrect data. To avoid problems from unreliable operation, and to verify data integrity, check-summing methodologies have been employed in disk read/write operations. One example of such a checksum methodology is the use of block appended checksums. Block appended checksums are described in U.S. patent application Ser. No. 09/696,666, entitled BLOCK-APPENDED CHECKSUMS, by Andy Kahn, et al, filed on Oct. 15, 2000 which is hereby incorporated by reference. One known implementation of block appended checksums (BAC) utilizes 520 BPS disks wherein the first 512 bytes of the sector represent data to be stored with the remaining eight bytes representing a checksum value. One example of methodology to compute such checksum is by adding, without carrying, all the data bytes. To verify this calculated checksum, the two's compliment is calculated and then added, again without carrying to the checksum. If the result is zero, the checksum is proper.
One noted disadvantage of block appended checksums is that they typically can only be utilized with disks having 520 BPS and many storage systems support only 512 bytes per sector. In these file systems, it is not possible to, for example, use a 512 BPS disk by storing 504 bytes of data and eight bytes of checksum information. Rather all 512 bytes must be allocated to data storage. In known storage system configurations that utilize 512 BPS disks, block appended checksums generally cannot be used. However, the use of 512 BPS disks may be necessary as this may be the only bytes per sector value allowed by some class of disk storage.
One known method to implement non-block appended checksums in 512 BPS disks is to store separately the checksum information on a different storage location in the disk. For example, a set number of disk sectors could be set aside at a predetermined disk location (e.g., the last X sectors of the disk) for storing checksum information. These predetermined disk locations for storing checksum information are contiguous blocks located in a reserved area of the disk. A noted disadvantage of this technique is that, to access the data and checksum information, two separate read operations are required. Thus, to access data and its corresponding checksum information, the disk needs to locate and read the data from its physical location on the disk and then locate and read the checksum information. Similarly, when writing data, the disk must first write the actual data in its proper data sector and then write the checksum to another sector located remotely from the data sector. As the checksum sectors are physically separate from the data sectors, the disk drive head must move and locate the appropriate sectors. The execution of multiple read/write operations, combined with continuous head shuffling, may significantly increase system overhead and degrade file service performance.