A storage server (also known as a “filer”) is a computer that provides storage services in both network attached storage (NAS) and storage area network (SAN) environments relating to the organization of information on storage devices, such as disks. The filer includes a storage operating system that implements a storage system to logically organize the information as a hierarchical structure of directories and files on the disks. Each “on-disk” file may be implemented as a set of disk blocks configured to store information, whereas the directory may be implemented as a specially-formatted file in which information about other files and directories are stored. A filer may be configured to operate according to a client/server model of information delivery to allow many clients to access files stored on the filer. In this model, the client may include an application, such as a file system protocol, executing on a computer that connects to the filer over a computer network. The computer network can include, for example, a point-to-point link, a shared local area network (LAN), a wide area network (WAN), or a virtual private network (VPN) implemented over a public network such as the Internet. Each client may request filer services by issuing file system protocol messages (in the form of packets) to the filer over the network.
A common type of file system is a “write in-place” file system, in which the locations of the data structures (such as inodes and data blocks) on disk are typically fixed. An inode is a data structure used to store information, such as metadata, about a file, whereas the data blocks are structures used to store the actual data for the file. The information contained in an inode may include information relating to: ownership of the file, access permissions for the file, the size of the file, the file type, and references to locations on disk of the data blocks for the file. The references to the locations of the file data are provided by pointers, which may further reference indirect blocks that, in turn, reference the data blocks, depending upon the quantity of data in the file. Changes to the inodes and data blocks are made “in-place” in accordance with the write in-place file system. If an update to a file extends the quantity of data for the file, an additional data block is allocated and the appropriate inode is updated to reference that data block.
Another type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block on disk is read from disk into memory and “dirtied” with new data, the data block is written to a new location on the disk to optimize write performance. A write-anywhere file system may initially assume an optimal layout, such that the data is substantially contiguously arranged on the disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations. A particular example of a write-anywhere file system is the Write Anywhere File Layout (WAFL®) file system available from Network Appliance, Inc. The WAFL file system is implemented within a microkernel as part of the overall protocol stack of the filer and associated disk storage. This microkernel is supplied as part of Network Appliance's Data ONTAP® storage operating system, residing on the filer, that processes file service requests from network-attached clients.
As used herein, the term “storage operating system” generally refers to the computer-executable code operable on a storage system that manages data access. The storage operating system may, in case of a filer, implement file system semantics, such as the Data ONTAP® storage operating system. The storage operating system can also be implemented as an application program operating on a general-purpose operating system, such as UNIX® or Windows®, or as a general-purpose operating system with configurable functionality, which is configured for storage applications as described herein.
Disk storage is typically implemented as one or more storage “volumes” that comprise physical storage disks, defining an overall logical arrangement of storage space. Currently available filer implementations can serve a large number of discrete volumes. Each volume is associated with its own file system and as used herein, the terms “volume” and “file system” are interchangeable.
The disks within a volume can be organized as a Redundant Array of Independent (or Inexpensive) Disks (RAID). RAID implementations enhance the reliability and integrity of data storage through the writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate storing of parity information with respect to the striped data. In the example of a WAFL® file system, a RAID 4 implementation is advantageously employed, which entails striping data across a group of disks, and storing the parity within a separate disk of the RAID group. As described herein, a volume typically comprises at least one data disk and one associated parity disk (or possibly data/parity) partitions in a single disk arranged according to a RAID 4, or equivalent high-reliability, implementation.
During the execution of a disk read operation, a host bus adapter in a storage adapter transfers data from the disk storage interface to a host buffer memory in the storage adapter. Under certain conditions and in rare cases, the host bus adapter will mismanage the memory buffer pointers in such a way as to effectively drop a portion of the read data stream.
FIG. 1 illustrates the case of two data frames being received from a disk by the host bus adapter during a read operation. The two frames comprise a 15 sector read from disk and the data is to be written into six non-contiguous host memory buffers. Buffer #2, a 2048 byte buffer, spans the two frames in that the last three sectors of the first frame and the first sector of the second frame are to be written into it.
After completely receiving the first data frame, the pointer into buffer #2, as maintained by the host bus adapter, should be pointing at an offset three sectors into the buffer. Suppose that, because of a defect in the host bus adapter, when the adapter starts to receive the second frame it “forgets” the offset into buffer #2 and rewinds the pointer to begin at the start of the buffer. The resulting data placement would appear as shown in FIG. 2, as if the read data were shifted upwards by three sectors, with the last three sectors of host memory remaining unwritten. Another type of error can occur, wherein no data is read (this is known as a “short read”).
In general terms, if there is a data stream that is to have a predetermined amount of data read into it (for example, 64K), the system will allocate 64K of buffer memory to hold the data to be read. The system operates under the assumption that when the read I/O is complete, the buffer will contain 64K of data, since that was the amount requested. If the buffer does not contain 64K of data for any reason, the system knows that there was an error. There is therefore a need to be able to detect data read errors where an allocated buffer memory is not completely filled by a read I/O.
Existing methods for detecting data corruption include the use of block-checksumming, which can be used to detect any kind of data corruption originating anywhere in the storage subsystem. However, recovering from data corruption is typically expensive and alarming to a user of the storage subsystem, because the subsystem usually assumes that the corruption has been caused by a fault in the disks themselves, and a typically recommended user action is to replace the supposedly faulty disk. In cases where there are known sources of corruption in the storage subsystem, e.g., in a host bus adapter, that are responsible for transient read corruption (such as the upward sector shift), it is advantageous to be able to quickly and cheaply detect that a transient read event occurred. A simple solution is to retry the failed I/O, instead of alerting upper layers of software to undergo an expensive recovery procedure and potentially target the wrong storage component (e.g., a disk) for replacement when such a replacement is not necessary.