A traditional data volume combines multiple storage devices (disks) to provide for more capacity, data redundancy, and I/O bandwidth. Data stored on a data volume may be replicated using one or more replication schemes. Replication schemes are used to recover data in the event of system or network failures. For instance, a replication scheme known as redundant array of inexpensive disks RAID-1 creates an exact copy (or mirror) of data on two or more disks. N-way mirror includes N disks (where N>1) and maintains N identical copies of data, one copy per disk.
In many RAID schemes data to be stored is segmented into data blocks, and the resulting data blocks are then used to compute additional parity blocks, using for instance an XOR function. Both the data blocks and the parity block are then written (in stripes) to multiple disks within the RAID. For instance, RAID-6 replication scheme records two independent parity blocks per each stripe, in order to provide protection against double disk failure.
RAID based data replication ensures continuous availability and protection of data, in addition to better I/O performance that is associated with spreading I/O workload among multiple independent disks. Data volumes that combine multiple disks organized in RAID groups are therefore commonly deployed for all the aforementioned reasons. The corresponding solutions and products, with RAID implemented in the hardware/firmware of the RAID arrays or software of the operating systems (such as Linux or Solaris) are practically ubiquitous.
A typical data volume includes one or more RAID groups of disks. A data volume may also include spare disks, to support automated (hot-plug) replacement of failed disks in the volume. More recently, support for solid state drives (SSD) was added by vendors, to improve write and read performance of data volumes via optimized logging and caching.
FIG. 1 illustrates a typical data volume 10 with a single RAID-5 group 12 including in this embodiment 4 data disks. The data volume 10 also includes two spare disks 13. In general, the RAID-5 replication scheme works as follows. Each logical block submitted by application (for instance, by a filesystem) for writing is first segmented into data blocks. Assuming the RAID-5 group 12 includes 4 data disks, for each set of 3 data blocks an additional parity block would have to be generated. The 3 data blocks and the parity block in combination are a said to be a stripe. Logical blocks are then written to the data volume 10 in stripes, wherein each stripe spans the entire 4 disks and includes 3 data blocks and one parity block. For a RAID-5 group including N disks, each stripe would consist of (N−1) data blocks and one parity block.
In general, replication schemes used in the existing data volumes are subject to the following issues.
First and foremost, even when a substantial redundancy is configured in, the conventional replication schemes present no protection against simultaneous failure of multiple drives within the RAID or a RAID controller itself. For instance, the RAID-5 shown on the FIG. 1 will not be able to withstand a simultaneous failure of any 2 of the 4 disks.
Redundancy itself has a price associated with reduced total capacity of the data volume. For instance, the capacity of a RAID-1 including same-size N disks (N>=2) would be equal the capacity of a single disk.
Finally, the conventional replication schemes do not make any distinction between the data disks within the RAID groups, and distribute data blocks and parity blocks—in stripes—uniformly across the entire set of data disks. In that regard, recent advances in flash memory technology, for instance, introduce a number of new requirements. In particular, rapid advances in performance, reliability, and storage capacities for solid state drives (SSD) make it possible, and often desirable, to use SSDs within the data volumes.
SSDs, in comparison with the traditional hard drives, provide a number of advantages including better random access performance (SSDs eliminate seek time), silent operation (no moving parts), and better power consumption characteristics. On the other hand, SSDs are more expensive and have limited lifetimes, in terms of maximum number of program-erase (P/E) cycles.
The pros and cons associated with the flash memory technology, in combination with the strict existing requirements on data availability and fault tolerance, can therefore be translated as a requirement to provide for a new type of a data volume: a heterogeneous data volume that includes different classes of data disks and supports non-uniform data striping.
Existing RAIDs do not differentiate between data disks as far as data read and write operations are concerned. It can therefore be said that existing RAIDs include a single class of data disks henceforth called “primary”. Accordingly, what is desired is a system and method to address the above-identified issues. The present invention addresses such a need.