A storage system typically comprises one or more storage devices into which data may be entered, and from which data may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
Storage of information on the disk array is preferably implemented as one or more storage “volumes”, defining an overall logical arrangement of disk space. The disks within a volume are typically organized as one or more groups, wherein each group is operated as a Redundant Array of Independent (or Inexpensive) Disks (RAID). Most RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate storing of redundant information with respect to the striped data. The redundant information may thereafter be retrieved to enable recovery of data lost when a storage device fails.
In the operation of a disk array, it is anticipated that a disk can fail. A goal of a high performance storage system is to make the mean time to data loss as long as possible, preferably much longer than the expected service life of the system. Data can be lost when one or more disks fail, making it impossible to recover data from the device. Typical schemes to avoid loss of data include mirroring, backup and parity protection. Mirroring stores the same data on two or more disks so that if one disk fails, the “mirror” disk(s) can be used to serve (e.g., read) data. Backup periodically copies data on one disk to another disk. Parity schemes are common because they provide a redundant encoding of the data that allows for loss of one or more disks without the loss of data, while requiring a minimal number of disk drives in the storage system.
Parity protection is used in computer systems to protect against loss of data on a storage device, such as a disk. A parity value may be computed by summing (usually modulo 2) data of a particular word size (usually one bit) across a number of similar disks holding different data and then storing the results on the disk(s). That is, parity may be computed on 1-bit wide vectors, composed of bits in predetermined positions on each of the disks. Addition and subtraction on 1-bit vectors are an equivalent to exclusive-OR (XOR) logical operations; these addition and subtraction operations can thus be replaced by XOR operations. The data is then protected against the loss of any one of the disks, or of any portion of the data on any one of the disks. If the disk storing the parity is lost, the parity can be regenerated from the data. If one of the data disks is lost, the data can be regenerated by adding the contents of the surviving data disks together and then subtracting the result from the stored parity.
Typically, the disks are divided into parity groups, each of which comprises one or more data disks and a parity disk. The disk space is divided into stripes, with each stripe containing one block from each disk. The blocks of a stripe are usually at equivalent locations on each disk in the parity group. Within a stripe, all but one block contain data (“data blocks”) with the one block containing parity (“parity block”) computed by the XOR of all the data. If the parity blocks are all stored on one disk, thereby providing a single disk that contains all (and only) parity information, a RAID-4 implementation is provided. If the parity blocks are contained within different disks in each stripe, usually in a rotating pattern, then the implementation is RAID-5. The term “RAID” and its various implementations are well-known and disclosed in A Case for Redundant Arrays of Inexpensive Disks (RAID), by D. A. Patterson, G. A. Gibson and R. H. Katz, Proceedings of the International Conference on Management of Data (SIGMOD), June 1988.
The storage operating system of the storage system typically includes a RAID subsystem that manages the storage and retrieval of information to and from the disks in accordance with input/output (I/O) operations. Configuration management in the RAID subsystem generally involves a defined set of modifications to the topology or attributes associated with a storage array, such as a disk, a RAID group, a volume or set of volumes. Examples of these modifications include, but are not limited to, disk addition, disk failure handling, volume splitting, volume online/offline and changes to (default) RAID group size or checksum mechanism.
In order for certain types of configuration management operations to work correctly, it is necessary to ensure that no concurrent I/O operations are underway (i.e., “in flight”) in the RAID subsystem. I/O operations typically have “knowledge” of the RAID topology, which is often embedded into the state associated with individual I/O operational units or I/O tasks. A change to the topology while these tasks are processing data can have an undefined, possibly damaging, effect. One approach to ensuring that no in flight operations are executing during configuration management is to make each type of I/O operation “inspect” the configuration state each time it is restarted after a suspension in order to determine whether changes have occurred. In this context, “suspension” denotes cooperative deferral of processing of an I/O operation based on a condition. The problem with this approach is that “guarding” for a configuration change in each type of I/O operation, given that certain operations log information or have partially committed state, is difficult and error prone.