The present invention relates generally to redundant arrays of independent disks (RAID) and, more particularly, to a system and method for handling temporary errors on a redundant array of independent tapes (RAIT).
A data storage array is a collection of storage elements that are accessible by a host computer as a single storage unit. The individual storage elements can be any type, or a combination of types, of storage devices such as, hard disk drives, semiconductor memory, optical disk drives, magnetic tape drives, and the like. A common storage array comprises a plurality of hard disk drives, i.e., a disk array.
A disk array includes a collection of disks and a disk array controller. The controller controls the operation of the disks and presents them as a virtual disk to a host operating environment. The host operating environment is typically a host computer that executes an operating system and application programs. A virtual disk is an abstract entity realized by the controller and the disk array. A virtual disk is functionally identical to a physical disk from the standpoint of application software executing on the host computer.
One such disk array is a redundant array of independent disks (RAID). RAID comes in various operating levels which range from RAID level 0 (RAID-0) to RAID level 6 (RAID-6). Additionally, there are multiple combinations of the various RAID levels that form hybrid RAID levels such as RAID-5+, RAID-6+, RAID-10, RAID-53 and so on. Each RAID level represents a different form of data management and data storage within the RAID disk array.
In a RAID-5 array, data is generally mapped to the various physical disks in data xe2x80x9cstripesxe2x80x9d across the disks and vertically in a xe2x80x9cstripxe2x80x9d within a single disk. To facilitate data storage, a serial data stream is partitioned into blocks of data, the size of each block is generally defined by the host operating environment. Typically, one or more blocks of data are stored together to form a xe2x80x9cchunkxe2x80x9d or xe2x80x9csegmentxe2x80x9d of data at an address within a given disk. Each chunk is stored on a different disk as the data is striped across the disks. Once all the disks in a stripe have been given chunks of data, the storage process returns to the first disk in the stripe, and stripes across all the disks again. As such, the input data stream is stored in a raster scan pattern onto all the disks in the array.
In a RAID-5 array, data consistency and redundancy is assured using parity data that is distributed amongst all the disks. Specifically, a RAID-5 array contains N member disks. Each stripe of data contains Nxe2x88x921 data strips and one parity strip. The parity segments of the array are distributed across the array members usually in cyclic patterns. For example, in an array containing five disks, the first parity strip is located in member disk four, the second parity strip on member disk three, and the third parity strip on member disk two, and so on.
RAID-5 parity is generated using an exclusive OR (XOR) function. In general, parity data is generated by taking an XOR function of the user data strips within a given data stripe. Using the parity information, the contents of any strip of data on any single one of the data disks in the array can be regenerated from the contents of the corresponding strips on the remaining disks in the array. Consequently, if the XOR of the contents of all corresponding blocks in the data stripe, except one is computed, the result is the content of the remaining block. Thus, if disk three in the five disk array should fail, for example, the data it contains can still be delivered to applications by reading corresponding blocks from all the surviving members and computing the XOR of their contents. As such, the RAID-5 array is said to be fault tolerant, i.e., the loss of one disk in the array does not impact data availability.
A problem with typical data storage element arrays is that in the event of a failed data storage element, data communication to all of the storage elements is stopped until the failed storage element executes its error recovery. The error recovery involves using the contents from the other storage elements to reconstruct the contents of the failed storage element. The probability of the data being corrected is high. However, a failed storage element can be unresponsive for significant periods. Consequently, the reading and writing of data from and to the data storage array are slowed.
What is needed are a method and system for handling temporary errors in a data storage element array that continuously communicate data to and from the array while a storage element is in error recovery.
Accordingly, it is an object of the present invention to provide a method and system for storing data in an array of storage elements arranged in parallel having data storage elements and redundant information storage elements in which data to be written to one of the data storage elements is written to one of the redundant information storage elements in place of redundant information if the one of the data elements is unresponsive for receiving data.
It is another object of the present invention to provide a method and system for storing data in an array of storage elements arranged in parallel having data storage elements and redundant information storage elements in which data is written to one of the redundant information storage elements in place of redundant information as long as one of the data elements is unresponsive for receiving data.
In carrying out the above objects and other objects, the present invention provides a method for storing data in a storage system having N storage elements arranged in parallel for concurrent access, where N is an integer greater than three. The method includes determining first redundancy information based on a first row of data to be striped across Nxe2x88x922 storage elements. Which of the Nxe2x88x922 storage elements are responsive for receiving data are then determined. The first row of data is then striped across the responsive storage elements of the Nxe2x88x922 storage elements if at least Nxe2x88x923 of the Nxe2x88x922 storage elements are responsive for receiving data. The first redundancy information is then written to the Nxe2x88x921 storage element. The data to be received by one of the Nxe2x88x922 storage elements is then written to the Nth storage element if the one of the Nxe2x88x922 storage elements is unresponsive for receiving data. The first redundancy information is written to the Nth storage element if all of the Nxe2x88x922 storage elements are responsive for receiving data.
In one embodiment, the method further includes determining second redundancy information based on a second row of data to be striped across the Nxe2x88x922 storage elements. The second row of data is then striped across the responsive storage elements of the Nxe2x88x922 storage elements if at least Nxe2x88x923 of the Nxe2x88x922 storage elements are responsive for receiving data. The second redundancy information is then written to the Nxe2x88x921 storage element. The data of the second row to be received by one of the Nxe2x88x922 storage elements is then written to the Nth storage element if the one of the Nxe2x88x922 storage elements is unresponsive for receiving data.
In another embodiment, the method further includes determining second redundancy information based on a second row of data to be striped across the Nxe2x88x922 storage elements and then determining if the one of the Nxe2x88x922 storage elements is still unresponsive for receiving data. The second row of data is then striped across the responsive storage elements of the Nxe2x88x922 storage elements if at least Nxe2x88x923 of the Nxe2x88x922 storage elements are responsive for receiving data. The second redundancy information is then written to the Nxe2x88x921 storage element. The data of the second row to be received by the one of the Nxe2x88x922 storage elements is then written to the Nth storage element if the one of the Nxe2x88x922 storage elements is still unresponsive for receiving data.
Further, in carrying out the above objects and other objects, the present invention provides a system in accordance with the above described method.
The advantages accruing to the present invention are numerous. Data can be continuously communicated to and from the storage element array while a storage element is in error recovery because the unresponsive storage element is dropped from the other storage elements for receiving data until error recovery is complete and the storage element is responsive for receiving data. While the storage element is unresponsive, one of the storage elements scheduled to receive redundant (parity) information receives the data to be written to the unresponsive storage element instead of the redundant information. Accordingly, communication to the storage elements never needs to be stopped while the failed storage element executes its error recovery. Further, error recovery can still be accomplished by using the redundant information of other redundant information storage elements.
The above objects and other objects, features, and advantages embodiments of the present invention are readily apparent from the following detailed description of the best mode for carrying out the present invention when taken in connection with the accompanying drawings.