Computer systems generally employ data storage devices, such as disk drive devices, or solid-state storage devices, for storage and retrieval of large amounts of data. Usually the data storage devices are arranged in an array. The most common type of a storage device array is the RAID (Redundant Array of Inexpensive (Independent) Drives). Arrays of solid-state storage devices, such as flash memory, phase change memory, memristors, and other non-volatile storage units, can also be used in data storage systems operating in accordance with RAID principles.
The RAID-based data storage systems may use a combination of mirroring and/or striping for providing greater protection from lost (or corrupted) data. For example, in some modifications of the RAID system, data is interleaved in stripe units distributed with parity information across all of the storage devices. The parity scheme in the RAID utilizes either a two-dimensional XOR algorithm or a Reed-Solomon Code in a P+Q redundancy scheme.
The main concept of the RAID is the ability to virtualize multiple drives (or other types of storage devices) in a single storage device representation. A number of RAID schemes have evolved, each designed on the principles of aggregated storage space and data redundancy. There are five standard RAID levels originally conceived, but many more operations have evolved. Most commonly used RAID levels include:
RAID 0 which provides a block-level striping without parity or mirroring, which has no redundancy.
In RAID 1, which uses mirroring without parity or striping, data is written identically to two storage devices, thereby producing a mirrored set. “Read” request is serviced by either of the two storage devices containing the requested data, and a “write” request of data is written to both storage devices.
In RAID 10, which uses mirroring and striping, data is written in stripe across the primary storage device and then mirrored to the secondary storage devices.
In RAID 2 level, which is based on bit level striping with dedicated Hamming-code parity, data is striped such that each sequential bit is on a different storage device.
In RAID 3 level, which is based on byte-level striping with dedicated parity, data is striped so each sequential byte is on a different storage device. Parity is calculated across corresponding bytes and stored on a dedicated parity storage device.
RAID 4, employs block-level data striping with dedicated parity, RAID 4 data distribution across drives is similar to RAID 3, but the granularity of the distributed data in RAID 4 (block-level) is coarser than that employed by RAID 3 (byte-level). In this setup, files may be distributed between multiple drives. Each drive operates independently, allowing I/O request to be performed in parallel.
RAID 5 uses block-level striping with distributed parity and distributes parity along with the data and requires all storage devices, but one, to be present to operate. The array in this arrangement is not destroyed by a single storage device failure. Upon storage device failure, any subsequent reads can be calculated from the distributed parity such that the storage device failure is masked from the end user.
RAID 6 uses block-level striping with double distributed parity, and tolerates up to two concurrent device failures.
Most of the RAID schemes employ an error protection scheme called “parity” which is a widely used method in information technology to provide for tolerance in a given set of data. An example RAID-5 storage device is illustrated in FIG. 1. In this example device, user data is aggregated into four stripes consisting of several blocks (blocks A1, B1, C1, D1, A2, . . . ). Each stripe also includes dedicated parity blocks (blocks Ap, Bp, Cp, and Dp) generated from the user data and a parity generation algorithm (such as an XOR scheme). Stripes are spread across all hard drives so that each drive absorbs one block from each stripe (either a parity block or data block). In this example, the parity block placement is shifted between stripes so that the parity data is distributed among all the drives. Each stripe block per drive can vary in size, for example, from 4 KB to 256 KB per stripe. The data block size can be modified during device setup or configuration to adjust performance.
Parity data allows RAID storages device to reconstruct lost or corrupted data. In the example illustrated in FIG. 1, the RAID can recover from a failure of Disk 1 by using the accessible user and parity data on Disk 0, Disk 2, and Disk 3 to reconstruct the lost data on Disk 1. For example, the RAID device uses data in blocks A1, A3, and Ap to reconstruct lost block A2.
Parity blocks are usually computed using the Exclusive OR (XOR) on binary blocks of data. An XOR comparison takes two binary bits, represented as “0” and “1”, compares them, and outputs an XOR result of “zero” or “one”. The XOR engine returns a “1” only if the two inputs are different. If both bits are the same, i.e., both “0”s or both 1”s, the output of the XOR engine would be “0”.
For example, as shown in Table 1:
for stripe 1, the XOR parity block may be placed in storage device 4,
for stripe 2, the XOR parity block may be placed in storage device 3,
for stripe 3, the XOR parity block may be placed in storage device 2, and
for stripe 4, the XOR parity block may be placed in storage device 1.
TABLE 1StorageStorageStorageStorageDevice 1Device 2Device 3Device 4Stripe 10100010100100011Stripe 20010000001100100Stripe 30011000110101000Stripe 40110000111011010
The parity blocks are computed by running the XOR comparison on each block of data in the stripe. If the first two blocks are XOR-ed, then the result is XOR-ed against the third block, and the XOR comparison continues for all storage devices in the array, except for the block where the parity is stored.
Traditionally, data persisted in burst buffers or tiers of non-volatile memories that are prone to failures, are protected by using a Central Controller and parity calculating RAID engine. This configuration requires copying data to the CPU, calculating parity by the CPU and then distributing the parity to the data channels. Typically, data to be striped across a set of storage devices is first written into the memory buffer of the CPU. The CPU then reads the data back in chunks (blocks) and calculates the XOR of the data to generate parity. The parity XOR data is then written back to the memory, and subsequently is transferred to the storage devices. This method requires all of the data to be buffered in the memory of the CPU which may create a CPU bottleneck problem.
Referring to FIG. 2 representing a typical RAID engine using a CPU, a host 10 sends a “write” data request to storage devices 12. The data is first written to a memory 14 attached to the CPU 16. In this arrangement, the data is sent to a PCIe switch 18 which forwards it to the CPU 16 which, in turn, passes the data into the memory 14. A memory controller 20 within the CPU 16 controls data writing to, and reading from, the memory 14.
The CPU 16 reads the data from the memory 14, performs an XOR of the data and writes the computed parity data into the memory 14. The CPU 16 then instructs the storage devices 12 to read the data and parity from the memory 14, and saves the data internally.
In this arrangement, all of the data is buffered in the memory 14, thus requiring an overly fast transfer rate of the data in the memory interface. This scheme requires the memory interface to the CPU to be greater than 3× faster than the transfer array of data.
In addition, the reliance of the XOR operation in this arrangement on an expensive CPU and/or GPU, as well as the need for an additional software to be written for the CPU and GPU operation, results in a complex and expensive scheme, which also has a large footprint and elevated needs for cooling and power consumption.
It is therefore desirable to provide XOR parity data generation in an efficient, inexpensive, and simple manner.