Redundant Array of Independent/Inexpensive Disks (RAID) is a known technique for providing increased storage and/or reliability through redundancy, combining multiple (typically low-cost and therefore often less reliable) data storage devices, such as disk drives, into a single logical storage unit where all drives in the array are substantially independent. Various different RAID schemes have been defined whereby RAID units may divide and replicate data among multiple storage devices, and provide data protection for stored data, in a variety of different ways and configurations.
There are three key concepts used for RAID schemes: mirroring, where multiple disks contain identical data; striping, where sequential blocks of data are split among multiple disks; and error correction, where redundant parity data is stored to allow problems to be detected and possibly repaired. For example, FIG. 1 illustrates an example of an array 100 of N data storage devices 110 in which a RAID 5 scheme has been implemented. RAID 5 uses block-level striping with distributed parity. As such, and as illustrated in FIG. 1, data is ‘striped’ across the data storage devices 110, such that consecutive blocks of data 120 are distributed across N−1 of the data storage devices 110. A parity bit/word 125 for the striped blocks of data 120 is stored in the Nth data storage device. The parity bits/words for the stripes of blocks of data are distributed over the N data storage devices 110, as opposed to being stored within a single, dedicated, data storage device. The use of parity bits/words in this manner enables data to be recoverable in the event that a data storage device 110 becomes ‘unavailable’, for example should the data storage device 110 become faulty or be physically removed or disconnected. Furthermore, by distributing the parity bits/words across the data storage devices 110 in the manner illustrated in FIG. 1, retrieving the parity data is less prone to process speed bottlenecks caused by having to read the parity data from a single device, since multiple read operations from the multiple data storage devices 110 may be performed substantially simultaneously. Accordingly, the process of recovering data is less time consuming.
However, an inherent limitation of such a RAID 5 scheme is that all but one of the data storage devices 110 is required to be present in order to recover data. Consequently, if more than one data storage device becomes unavailable (e.g. faulty), it will not be possible to recover the data. When a data storage device becomes ‘unavailable’, e.g. develops a fault or the like, the data stored in the unavailable device must be recovered using the data and parity bits/words stored in the remaining data storage devices, and the entire ‘database’ must be rebuilt. Specifically, all of the data stored within the array 100 must be re-written to the remaining data storage devices, and the parity bits/words re-generated. This is required to be performed before any new data may be written to the array. An extension of traditional RAID 5 schemes is RAID 6, which comprises the use of double distributed parity blocks (e.g. using a Galois calculation), whereby fault tolerance is provided from two data storage device failures.
With the increasing sizes of data storage devices, the time taken to perform such data recovery is becoming increasingly longer. Furthermore, with the demand for the number of data storage devices within an array increasing, the frequency with which data storage devices become ‘unavailable’ (e.g. through device failure) is also increasing. Accordingly, there is a need to minimize the time taken to perform data recovery and to enable operation of the array to resume.
However, once data from a ‘lost’ data storage device has been recovered, data and the corresponding parity bits/words cannot simply be re-written to the remaining data storage devices using, say, the existing RAID command stack. This is due to the RAID algorithms, etc., being configured for an array of N data storage devices. Thus, following such data recovery the array will only comprise ‘N−1’ available data storage devices until the lost data storage device is repaired or replaced. Accordingly, it is necessary for the RAID algorithms, etc., to be reconfigured. In the case of a hardware implementation of a RAID controller, this typically requires the entire RAID command stack to be re-written, which must be performed by way of software executing on, for example, a central processing unit (CPU) of the system. Such a rewriting of the RAID command stack is an exhaustive and time consuming process, not only delaying the return to operation of the array, but also consuming valuable system processing resources.
Due to the complexities of the operations required to be performed by RAID algorithms, and the typically limited resources available to them, known RAID controllers are also limited to use with data storage devices comprising equal size data storage capacities.