Computer systems generally employ data storage devices, such as disk drive devices, or solid-state storage devices for storage and retrieval of large amounts of data. The arrays of solid-state storage devices such as flash memory, phase change memory, memristors, or other non-volatile storage units, may also be used in data storage systems.
The most common type of a storage device array is the RAID (Redundant Array of Inexpensive (Independent) Drives). The main concept of the RAID is the ability to virtualize multiple drives (or other storage devices) in a single drive representation. A number of RAID schemes have evolved, each designed on the principles of aggregated storage space and data redundancy.
Most of the RAID schemes employ an error protection scheme called “parity” which is a widely used method in information technology to provide for tolerance in a given set of data. For example, in the RAID-5 data structure, data is striped across the hard drives, with a dedicated parity block for each stripe. The parity blocks are computed by running the XOR comparison on each block of data in the stripe. The parity is responsible for the data fault tolerance. In operation, if one disk fails, a new drive can be put in its place and the RAID controller can rebuild the data automatically using the parity data.
Current RAID engines generally use a CPU (or GPU) with a DMA (Direct Memory Access) capability attached to a large memory to perform XOR operations to generate parity. Typically, data to be striped across a set of drives is first written into the memory buffer of the CPU. The CPU then reads the data back in chunks (blocks) and calculates the XOR of the data to generate parity. The parity XOR data is then written back to the memory, and subsequently is “flashed” to the storage disks. This method requires all of the data to be buffered in the memory of the CPU.
Referring to FIG. 1 representing a typical RAID engine using a centralized CPU for computational operations, when a host 10 sends a “write” data request to storage devices 12, the data is first written to a memory 14 attached to the CPU 16. In this arrangement, the data is sent to a PCIe switch 18 that forwards it to the CPU 16 which in turn passes the data into the memory 14. A memory controller 20 within the CPU 16 controls data writing to and reading from the memory 14.
The CPU 16 reads the data from the memory 14, performs an XOR of the data, and then writes the data back into the memory 14. The CPU 16 then instructs the storage devices 12 to read the data and parity from the memory 14 and saves the data internally. This conventional centralized CPU scheme potentially may experience a sensible bottleneck in data migration through the data storage system.
In this arrangement, all of the data is buffered in the memory 14, thus requiring an overly fast transfer rate of the data in the Memory Interface. This scheme requires the Memory Interface to the CPU to be 3× (+2× for parity) faster than the transfer array of data.
In addition, the reliance of the XOR (or any other requested Boolean logical or arithmetic) operation in this arrangement on an expensive CPU (and/or GPU), as well as the need for an additional software to be written for the CPU (and GPU) operation, results in a complex and expensive scheme, which also has a large footprint and elevated needs for cooling and power consumption.
It is therefore desirable to provide a data storage system with expanded logic and arithmetic functionality beyond RAID, which may perform computations in an efficient, inexpensive, and simple manner without reliance of the compute operation or buffering data in the CPU (or GPU).