A storage system typically comprises one or more storage devices into which data may be entered, and from which data may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in the context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
Storage of information on the disk array is preferably implemented as one or more storage “volumes”, defining an overall logical arrangement of disk space. The disks within a volume are typically organized as one or more groups, wherein each group is operated as a Redundant Array of Independent (or Inexpensive) Disks (RAID). Most RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate storing of redundant information with respect to the striped data. The redundant information may thereafter be retrieved to enable recovery of data lost when a storage device fails.
In the operation of a disk array, it is anticipated that a disk can fail. A goal of a high performance system is to make the mean time to data loss as long as possible, preferably much longer than the expected service life of the system. Data can be lost when one or more disks fail, making it impossible to recover data from the device. Typical schemes to avoid loss of data include mirroring, backup and parity protection. Mirroring stores the same data on two or more disks so that if one disk fails, the “mirror” disk(s) can be used to serve (e.g., read) data. Backup periodically copies data on one disk to another disk. Parity schemes are common because they provide a redundant encoding of the data that allows for loss of one or more disks without the loss of data, while requiring a minimal number of disk drives in the storage system.
Parity protection is used in computer system to protect against loss of data on a storage device, such as a disk. A parity value may be computed by summing (usually modulo 2) data of a particular word size (usually 1 bit) across a number of similar disks holding different data and then storing the results on the disk(s). That is, parity may be computed on 1-bit wide vectors, composed of bits in predetermined positions on each of the disks. Addition and subtraction on 1-bit vectors are an equivalent to exclusive-OR (XOR) logical operations; these addition and subtraction operations can thus be replaced by XOR operations. The data is then protected against the loss of any one of the disks, or of any portion of the data on any one of the disks. If the disk storing the parity is lost, the parity can be regenerated from the data. If one of the data disks is lost, the data can be regenerated by adding the contents of the surviving data disks together and then subtracting the results from the stored parity.
Typically, the disks are divided into parity groups, each of which comprises one or more data disks and a parity disk. The disk space is divided into stripes, with each stripe containing one block from each disk. The blocks of a stripe are usually at equivalent locations on each disk in the parity group. Within a stripe, all but one block contain data (“data blocks”) with the one block containing parity (“parity block”) computed by the XOR of all the data. If the parity blocks are all stored on one disk, thereby providing a single disk that contains all (and only) parity information, a RAID-4 implementation is provided. If the parity blocks are contained within different disks in each stripe, usually in a rotating pattern, then the implementation is RAID-5. The term “RAID” and its various implementations are well-known and disclosed in A Case for Redundant Arrays of Inexpensive Disks (RAID), by D. A. Patterson, G. A. Gibson and R. H. Katz, Proceedings of the International Conference on Management of Data (SIGMOD), June 1988.
The storage operating system of the storage system typically includes a RAID subsystem that manages the storage and retrieval of information to and from the disks of a storage array in accordance with input/output (I/O) operations. In addition, the storage operating system includes administrative interfaces, such as a user interface, that enable operators (system administrators) to access the system in order to implement, e.g., configuration management decisions. Configuration management in the RAID subsystem generally involves a defined set of modifications to the topology or attributes (i.e., configuration) associated with a storage array, such as a disk, a RAID group, a volume or set of volumes. Examples of these modifications include, but are not limited to, disk failure handling, volume splitting, volume online/offline, changes to (default) RAID group size or checksum mechanism, and disk addition.
Typically, configuration management requests are issued through a user interface oriented towards operators that are knowledgeable about the underlying physical aspects of the system. That is, the interface is often adapted towards physical disk structures and management that the operators may manipulate in order to present a view of the storage system on behalf of a client. Often, it is desirable to know the outcome of a proposed configuration operation prior to issuing a configuration management request and being forced to live with the results. For example, it is desirable to understand disk allocation and placement decisions that will be made by the RAID subsystem when a request is made to add one or more disks to a volume. Once the disks have been added to the volume, it is difficult to “undo” the addition without taking drastic measures.