A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives, organized as one or more disk arrays, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term “disk” in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
Storage of information on the disk array is illustratively implemented as one or more storage volumes of physical disks, defining an overall logical arrangement of storage space. The disks within a volume are typically organized as one or more groups, wherein each group is operated as a Redundant Array of Independent (or Inexpensive) Disks (RAID) or other suitable redundancy technique. Most RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate storing of redundant information with respect to the striped data. The redundant information enables recovery of data lost when a storage device fails.
In the operation of a disk array, it is anticipated that a disk can fail. A goal of a high performance storage system is to make the mean time to data loss (MTTDL) as long as possible, preferably much longer than the expected service life of the system. Data can be lost when one or more disks fail, making it impossible to recover data from the device. Typical schemes to avoid loss of data include mirroring, backup and/or parity protection. Mirroring is an expensive solution in terms of consumption of storage resources, such as disks. Backup does not protect data modified since the backup was created. Parity schemes are common because they provide a redundant encoding of the data that allows for a single erasure (loss of one disk) with the addition of just one disk drive to the system.
Parity protection is used in computer systems to protect against loss of data on a storage device, such as a disk. A parity value may be computed by summing (usually modulo 2) data of a particular word size (usually one bit) across a number of similar disks holding different data and then storing the results on an additional similar disk. That is, parity may be computed on vectors 1-bit wide, composed of bits in corresponding positions on each of the disks. When computed on vectors 1-bit wide, the parity can be either the computed sum or its complement; these are referred to as even and odd parity respectively. Addition and subtraction on 1-bit vectors are both equivalent to exclusive-OR (XOR) logical operations. The data is then protected against the loss of any one of the disks, or of any portion of the data on any one of the disks. If the disk storing the parity is lost, the parity can be regenerated from the data. If one of the data disks is lost, the data can be regenerated by adding the contents of the surviving data disks together and then subtracting the result from the stored parity.
Typically, the disks are divided into parity groups, each of which comprises one or more data disks and a parity disk. A parity set is a set of blocks, including several data blocks and one parity block, where the parity block is the XOR of all the data blocks. A parity group is a set of disks from which one or more parity sets are selected. The disk space is divided into stripes, with each stripe containing one block from each disk. The blocks of a stripe are usually at the same locations on each disk in the parity group. Within a stripe, all but one block are blocks containing data (“data blocks”) and one block is a block containing parity (“parity block”) computed by the XOR of all the data. If the parity blocks are all stored on one disk, thereby providing a single disk that contains all (and only) parity information, a RAID-4 implementation is provided. If the parity blocks are contained within different disks in each stripe, usually in a rotating pattern, then the implementation is RAID-5. The term “RAID” and its various implementations are well-known and disclosed in A Case for Redundant Arrays of Inexpensive Disks (RAID), by D. A. Patterson, G. A. Gibson and R. H. Katz, Proceedings of the International Conference on Management of Data (SIGMOD), June 1988 and U.S. Pat. No. 6,993,701 issued on Jan. 31, 2006 for a ROW-DIAGONAL PARITY TECHNIQUE FOR ENABLING EFFICIENT RECOVERY FROM DOUBLE FAILURES IN A STORAGE ARRAY, by Peter Corbett, et al.
As used herein, the term “encoding” means the computation of a redundancy value over a predetermined subset of data blocks, whereas the term “decoding” means the reconstruction of a data or parity block by using a subset of data blocks and redundancy values. If one disk fails in the parity group, the contents of that disk can be decoded (reconstructed) on a spare disk or disks by adding all the contents of the remaining data blocks and subtracting the result from the parity block. Since two's complement addition and subtraction over 1-bit fields are both equivalent to XOR operations, this reconstruction consists of the XOR of all the surviving data and parity blocks. Similarly, if the parity disk is lost, it can be recomputed in the same way from the surviving data.
A noted disadvantage of such RAID implementations, particularly a RAID implementation utilizing distributed parity, e.g., RAID 5, involves a mapping technique of logical storage blocks identified by logical block numbers, e.g., volume block numbers (VBN) to physical storage block locations on disk identified by disk block numbers (DBN). The VBNs are typically utilized by a high-level module, such as a file system, executing on the storage system, while the DBNs are typically utilized by a low-level module, such as a RAID and/or disk driver module of the system. The VBNs represent logical block locations in a logical VBN storage space typically spanning multiple disks or other physical storage devices and the DBNs represent physical block locations in a physical DBN storage space. The noted disadvantage arises as each disk of the RAID implementation stores both data and parity blocks, and may be exacerbated when an objective of the implementation is to support seamless disk additions. Since file systems generally only read/write data blocks (i.e., parity blocks are “hidden” from the file system), the technique utilized to map logical blocks to their physical disk block locations must be sufficiently “intelligent” to skip the parity blocks. In addition, seamless disk additions require that the mapping techniques handle any incremental growth of the VBN and DBN storage spaces. To ensure a balanced/uniform distribution of parity blocks across all disks even after a disk addition (single or multiple), some physical block locations (i.e., DBNs) occupied by parity must now store user data instead. As a result, the mapping technique must be able to handle the conversion of parity blocks to data blocks, which may be triggered, e.g., as a result of the relocation of parity during disk addition. Conventional distributed parity architectures such as RAID 5 have generally been configured for file systems that utilize a flat, one-dimensional address storage space. The VBN to DBN mapping techniques for these configurations have typically been simple, as these techniques do not support disk additions. Those techniques that do support disk additions typically resort to extremely expensive parity re-computation and/or block copy operations.
However, disk topology aware file systems, such as the Write Anywhere File Layout (WAFL®) file system available from NetApp of Sunnyvale, Calif., may exploit knowledge of the layout of a disk array to implement highly efficient write operations. Typically, disk topology aware file systems utilize RAID 4 implementations that store the parity on dedicated disk drives, thereby obviating the need for the file system to be aware of the dedicated parity disks. RAID 4 implementation works because a disk topology aware file system has the advantage of being able to implement efficient write operations (e.g. efficient stripe updates) where the cost of updating parity is amortized over many updates to data blocks in nearby stripes.
A distributed parity organization, on the other hand, e.g. RAID-5, has the advantage of providing higher IOPs since all disk spindles are available for read operations. The obvious approach for implementing a distributed parity layout in a disk-topology-aware filesystem is to include parity blocks within the VBN space. With this scheme expansion by disk addition is easy. However, this approach suffers from many shortcomings. For example, this approach limits the VBN space which can be used for client data since some part of the address space is consumed by parity, thereby restricting the size of the maximum aggregate or flexvol which can be created. Secondly, filesystem management becomes complicated since constructs like allocation maps, active map, summary map, etc. must now be aware of parity blocks and must appropriately account for them when processing user operations. Finally, backup operations which use snapshots as the underlying mechanism require identical source destination geometries, thus severely limiting configurations.