The present invention, in some embodiments thereof, relates to a method and apparatus for secure data storage in RAID memory devices and, more particularly, but not exclusively, to such a method and apparatus which conforms to the requirements of the RAID 6 specification for data recovery following two disk failures.
Raid is an acronym for Redundant Array of Independent Disks, and is a system for storing data on multiple disks in which redundancy of data storage between the disks ensures recovery of the data in the event of failure. This is achieved by combining multiple disk drive components into a logical unit, where data is distributed across the drives in one of several ways called RAID levels.
RAID is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple physical disk drives. The terms disks and drives will be used interchangeably henceforth. The physical disks are said to be in a RAID array, which is accessed by the operating system as one single disk. The different schemes or architectures are named by the word RAID followed by a number (e.g., RAID 0, RAID 1). Each scheme provides a different balance between two key goals: increasing data reliability and increasing input/output performance.
The most basic form of RAID—a building block for the other levels but not used for data protection, is Raid 0, which has high performance but no redundancy. The data is spread evenly between N disks. RAID 0 gives maximum performance since data retrieval is carried out on all N disks in parallel. However each data item is stored exactly once so disk failure always loses some data.
RAID 1 requires mirroring of all the data. Capacity drops by 50% since all data is stored twice, but excellent performance is still achieved since the data is still spread between disks in the same way, allowing for parallel reads. RAID 1 can support failure of one of each pair of disks, however the price is the loss of half of the capacity. Although multiple disk failures can be tolerated, only one failure is possible per mirrored pair without loss of data.
In greater detail, RAID 1 is mirroring. Mirroring comprises writing each block of data to two disks, D0 and D1, and reconstructing a disk by copying its mirror disk upon failure. This method requires performing two disk writes per user write, and consumes an overhead of 100% in capacity. Its rebuild requires performing reads and writes in proportion to the size of the failed disk, without additional computation penalties. Additionally, reading data which resided on the failed disk while in degraded mode requires a single disk read, just as under a normal system operation.
In general, RAID-1 protects from single disk failure. It may protect from more than one failure if no two failed disks are part of the same pair, known as a “RAID group”. RAID-1 may also be implemented in “n-way mirroring” mode to protect against any n−1 disk failures. An example is RAID 1.3 which introduced three way mirroring, so that any two disks could fail and all the data could still be recovered. The cost however is that there is only 33% utilization of the disks.
A requirement thus became apparent, to somehow develop a system that allowed for the system to recover all data after the failure of any disk at the cost of a more reasonable overhead, and as a result RAID 4 was developed.
RAID 4 uses a parity bit to allow data recovery following failure of a bit. In RAID 4 data is written over a series of N disks and then a parity bit is set on the N+1 disk. Thus if N is 9, then data is written to 9 disks, and on the tenth, a parity of the nine bits is written. If one disk fails the parity allows for recovery of the lost bit. The failure problem is solved without any major loss of capacity. The utilization rate is 90%. However the tenth disk has to be changed with every change of every single bit on any of the nine disks, thus causing a system bottleneck.
In greater detail, a RAID-4 group contains k data disks and a single parity disk. Each block i in the parity disk P contains the XOR of the blocks at location i in each of the data disks. Reconstructing a failed disk is done by computing the parity of the remaining k disks. The capacity overhead is 1/k. This method contains two types of user writes—full stripe writes known as “encode” and partial stripe modifications known as “update”. When encoding a full stripe, an additional disk write must be performed for every k user writes, and k−1 XORs must be performed to calculate the parity. When modifying a single block in the stripe, two disk reads and two disk writes must be performed, as well as two XORs to compute the new parity value. The rebuild of a failed block requires reading k blocks, performing k−1 XORs, and writing the computed value. Reading data which resided on the failed disk while in degraded mode also requires k disk reads and k−1 XOR computations. RAID-4, like RAID-1, protects from a single disk failure.
RAID 5 solves the bottleneck problem of RAID 4 in that parity stripes are spread over all the disks. Thus, although some parity bit somewhere has to be changed with every single change in the data, the changes are spread over all the disks and no bottleneck develops.
However RAID 5 still only allows for a single disk failure.
In order to combine the multiple disk failure of RAID 1.3 with the high utilization rates of RAID 4 and 5, and in addition to avoid system bottlenecks, Raid 6 was specified to use an N+2 parity scheme that allows failure of two disks. RAID 6 defines block-level striping with double distributed parity and provides fault tolerance of two drive failures, so that the array continues to operate with up to two failed drives, irrespective of which two drives fail. Larger RAID disk groups become more practical, especially for high-availability systems. This becomes increasingly important as large-capacity drives lengthen the time needed to recover from the failure of a single drive. Following loss of a drive, single-parity RAID levels are as vulnerable to data loss as a RAID 0 array until the failed drive is replaced and its data rebuilt, but of course the larger the drive, the longer the rebuild takes, causing a large vulnerability interval. The double parity provided by RAID 6 gives time to rebuild the array without the data being at risk if a single additional drive fails before the rebuild is complete.
Reference is now made to FIG. 1, which illustrates a general scheme for RAID-6. RAID-6 is similar to RAID-4 and RAID-5, and can be seen as an extension of these schemes. The main difference is that RAID-6 schemes can tolerate up to two disk failures. The implementation of RAID-6 is not well defined, and several coding schemes are known. RAID-6 is herein defined as any N+2 coding scheme which tolerates double disk failure, while user data is kept in the clear. This additional requirement assures that user reads are not affected by the RAID scheme under normal system operation. The different possible coding schemes vary in performance with respect to various parameters and typical parameters are shown in Table 1.
TABLE 1Raid 6 Parameters Prior Art.ParameterOptimal ValueCapacity2/kOverheadUpdate Overhead2 Writes3 Reads3 XORsRebuild Overheadk/2 Reads(1st disk failure)k − 1 XORsRebuild Overheadk/2 Reads(2nd disk failure)k − 1 XORsFailed Disks2Supported
With reference to Table 1, we now describe the main parameters used to measure such a RAID scheme, alongside their optimal values. The first parameter is capacity overhead. The optimal scheme includes two redundancy disks (which may or may not be parity based) for every k data disks, thus reaching a capacity overhead of 2/k. It should be noted, that based on statistical considerations of double disk failure, under a RAID-6 scheme k can easily be set to be twice as large as under RAID-5, thus keeping the same capacity overhead ratio.
When updating a certain block in a stripe, we are interested in the number of IOs required and the number of calculations that must be performed. The optimal is three reads, three writes and three XORs.
RAID-6 rebuild includes two different processes—rebuilding after one disk failure, and rebuilding after two disk failures. After a single disk failure, the optimal number of reads needed is k/2, as opposed to k reads in RAID-4. Such optimal performance requires codes which permit reading partial columns, by taking advantage of both redundancy blocks of the stripe, as described in greater detail hereinbelow. The minimal number of XORs required is k−1. After the second disk failure, rebuilding a failed block, on average, requires reading k/2 blocks, performing k−1 XORs, and writing the computed value. It should be noted that this does not imply that rebuilding a specific block can be done efficiently, since the rebuilding of one block may depend upon the rebuilding of a different block.
In order to prevent bottlenecks, RAID-6 may also be implemented in the manner of RAID-5, where redundancy information is spread on the various disks in a well-balanced manner.
The specification for RAID 6 does not specify how the data recovery is to be achieved and each storage manufacturer embodies RAID 6 in a different way.
Several RAID-6 schemes have been proposed and used in practice. One solution is to use the Reed Solomon error correction code, which is expensive to calculate.
Another possibility is with parity bits. N Data disks are supported by two redundancy disks p1 and p2, each one holding a different parity bit. Again, if all the parity bits are on the same two disks then the bottleneck becomes a problem. However the problem can be solved by use of distributed parity stripes over N+2 disks as was specified in RAID 5.
The following describes two such coding schemes which are based on parity calculations of rows and diagonals in a matrix of blocks. These two codes are known as Even/Odd and RDP. They both add a second parity disk, labeled Q, which contains blocks that hold the parity of diagonals of the data blocks. P, as before, contains blocks that hold the parities of rows of blocks. Note that in both schemes, it is advantageous to work with a block size which is smaller than the native page size, for the examples in this section we assume the native page size is 4 KB, and that the block size is 1 KB. Each stripe contains four rows, and thus the four blocks present on each disk form a single native page. It is assumed that pages are read and written using a single disk operation.