Current practice in storage networking implementations is to rely on hardware and software implementations of the well known Redundant Array of Independent Disks (RAID) technology to insulate computing applications and the data used by these applications against interruption, or the loss of or damage to the data required by these applications, occasioned by the failure of the storage devices used to hold the data. Hardware and software RAID implementations are effective in delivering protection against single device failure, and may in some circumstances, be used to provide protection against multiple device failure.
Storage devices are typically connected to computer systems that are used to run the applications that make use of data stored on the storage devices. The incidence of a failure of the computer systems can, independent of any possible failure of the storage devices, occasion the loss of, or damage to, data that is being used by the application at the time of the computer failure. Current practice is to guard against the results of failures of the computer system through the use of journaling schemes that record the list of necessary parameters of incomplete storage operations. Following recovery of the computer system after a failure, such journals can be replayed to restore the contents of the storage devices to a known state, allowing applications to resume processing at the point of failure.
Well-known RAID schemes make use of the parity computation algorithm to implement data protection in the face of device failure. In typical implementations, such as the one shown in the prior art embodiment of FIG. 1, when a block of binary data is written to a RAID controller, the controller will use the contents of one or more blocks of data previously or concurrently written to the controller to compute a parity block, and will then write the original data and the computed parity block to physical storage devices 13. Insulation against the failure of a storage device in the prior art RAID configuration is made possible by ensuring that the parity blocks reside on a storage device that is different from the device or devices that hold the blocks used to compute the parity block. Should a device that holds the original data fail, the RAID controller can still respond to an access request by reading the parity block associated with the original data, along with the other block that form the RAID block stripe group, recover the data from the failed device using the parity algorithm, and send it back to the requesting storage client.
In a standard RAID scheme, the parity blocks are distributed using one or two straightforward schemes. The RAID Level 4 (RAID 4) scheme puts all of the parity data on a designated storage device, while on a RAID Level 5 (RAID 5) scheme, the parity blocks are scattered across all of the storage devices in the RAID pool using a round robin style algorithm. In prior art RAID schemes there is only one storage controller, which is equivalent to a single storage server in the context of the current invention. The standard RAID distribution scheme does not work in the context of a clustered computing environment because the clustered storage server pool contains more than one server, and may contain a large number of storage servers. The distribution of the parity computations over many servers requires that each server be able to locate both the data blocks needed for the computation, and be able to locate the parity blocks, both of which are distributed across all of the storage servers.
There exists therefore a need for a scalable error-recovery scheme against storage device failure to be used in a clustered computing environment.
The RAID-5 scheme, widely used in both hardware and software implementations available in many products used in data storage applications, operates on a pool of storage devices by applying the RAID algorithm. Configuration of a RAID-5 implementation involves specification of the RAID stripe size. The implementing hardware or software scheme than scatters the data blocks and associated parity block over the physical storage devices in a manner that guarantees that in the event of the failure of one of the storage devices, the remaining devices will be able to provide all of the data blocks and parity blocks needed to reconstruct the data on the failed storage device.
RAID-5 implementations suffer from severe performance problems when processing write operations on data blocks. In order to update a single data block on a RAID-5 based storage pool with a stripe size of N, the implementing hardware or software must first read the N−2 other blocks in the stripe, compute the new parity block, and arrange to write the new contents of the data block and the parity block to their respective storage devices.
An additional complication is that steps must be taken to ensure that the contents of the other data blocks in the stripe are not modified during the computation of the parity block. These characteristics of the RAID-5 implementation result in increased latency of write operations which manifest themselves as very poor storage system write performance when compared to that of a system that simply writes the block to a single disk unit.
There exists therefore a need for a method of data protection and recovery on storage pools providing an improved storage system write performance.