Typical data volumes consist of one or more storage disks with similar characteristics configured in a specific replication scheme to provide increased capacity, I/O performance and high availability of the data. Conventional replication schemes are commonly implemented as Redundant Arrays of Inexpensive Disks (RAID); a variety of RAID configurations (or schemes) exist to suit different storage needs. For instance, RAID-1 scheme maintains an exact copy (or mirror) of data blocks on two or more disks. An N-way mirror is said to include N disks (where N>1) that maintain N identical copies of data, one copy per disk.
Data volumes typically comprise one or more RAID groups of disks and, optionally, spare disks that can be hot-plugged to the volume in case of a failure of a data disk.
FIG. 1 illustrates a typical data volume 100 with a single RAID-5 group 101 comprising four storage disks. The data volume 100 also includes three spare disks 102. In general, the RAID-5 replication scheme works as follows. Each logical block submitted by an application for writing is first segmented into data blocks. Assuming the RAID-5 group 101 includes four data disks, for each set of three data blocks an additional parity block would have to be generated. The three data blocks and the parity block in combination are said to be a stripe. Logical blocks are then written to the data volume 100 in stripes, wherein each stripe spans the entire four disks and includes three data blocks and one parity block. The RAID-5 scheme improves the overall I/O performance and provides for data recovery should any one of the data disks fail. In the event of, for instance, a latent sector error, a corresponding corrupted data block may be reconstructed from the remaining data blocks and a parity block. RAID-5 will survive a total loss of a single drive as well.
In general, replication schemes used in the existing data volumes are subject to the following issues:
First and foremost, even with substantial redundancy configured in, the conventional replication schemes present no protection against simultaneous failure of multiple drives within the RAID or a RAID controller itself. For instance, the RAID-5 shown on the FIG. 1 will not be able to withstand simultaneous failure of any two of its four disks. This simple example demonstrates the need to maintain remote copies using external to RAID mechanisms, such as third party backup and disaster recovery software.
Redundancy itself has a price. The price is associated with reduced effective capacity of the data volume. For instance, the capacity of a RAID-1 including same-size N disks (N>=2) would be equal to the capacity of a single disk. Hence, there is a tradeoff that needs to be made between data protection (via multiple copies of data) and effective storage capacity.
Yet another type of the tradeoff that storage administrators and IT departments needs to often consider is the tradeoff between storage capacity and I/O performance. In particular, rapid advances in performance, reliability, and storage capacities for solid state drives (SSD) make it feasible to be used within the data volumes. In comparison to rotating hard disk drives (HDDs), SSDs offer better random I/O performance, silent operation and better power consumption due to absence of any moving parts. SSDs however have a number of limitations, including dependency on strict 4K or 8K I/O alignment, certain requirements on I/O block size required for optimal performance, degrading performance due to wearing of storage cells, lower capacity compared to HDDs and higher price. All of the above renders SSDs suitable for storage of certain types of data—in particular, data that requires superior I/O performance (better IOPS).
Similarly, available HDDs differ substantially in performance, depending on the vendor, model, capacity and other characteristics. Using disks of different types in a conventional data volume will yield sub-optimal performance.
For instance, let's consider a read workload in a data volume containing two disks configured in RAID-1 (mirror) scheme—one disk operating at 100 MB/second and another at 200 MB/second for read operations. Traditional data volumes will spread the I/O workload evenly among all the disks in the volume. The combined read throughput in this example, assuming a conventional round-robin scheme employed in the data volume, will average to up to 133 MB/second. This is of course better than 100 MB/second of the slower disk but certainly much worse than the 200 MB/second expected of the faster disk.
An attempt to utilize both SSDs and HDDs in a conventional volume will also produce sub-optimal results. Due to the seek latency and rotational delay, existing data access mechanisms utilize I/O request queuing and reordering specifically fine-tuned for rotating disks. The corresponding I/O processing logic is unnecessary for SSDs because of the fact that SSDs have no rotational delay or seek latency.
Further, an attempt to utilize disks with different characteristics within a conventional data volume may adversely affect not only I/O performance of a conventional data volume but its reliability as well. For instance, SSDs have limited lifetimes, in terms of maximum number of program-erase (P/E) cycles. The life span of SSDs can be increased if data access mechanisms that write to SSDs compress the data before writing. This and similar differentiation targeting heterogeneous storage media is simply not present and not designed-in, as far as conventional data volumes are concerned.
To summarize, existing data volumes provide the benefits of transparent access to multiple disks and replication schemes for applications, such as filesystems, databases, search engines, cloud storage systems. The associated tradeoffs and limitations include the tradeoff between effective capacity and levels of replication (redundancy), and the tradeoff between capacity and I/O performance. This is exactly why optimizing applications that access different types of data for overall performance and reliability often comes with additional complexity of accessing multiple data volumes—a volume per type of data. For example, a database I/O performance will improve if its indexes are stored on SSDs. It may appear to be not feasible, however, to store database tables (as opposed to indexes) on the SSDs as well—the latter may require multiple terabytes or even petabytes of capacity. Hence, this tradeoff between capacity and I/O performance currently cannot be resolved within a scope of a single data volume.
Similarly, existing local filesystem contain files and directories that, from a user perspective, often require different access latencies, priorities of I/O processing, capacities, replication levels—or a combination of all of the above. Existing local filesystems do not, however, span multiple data volumes. Therefore, the requirement to handle files and directories of a given filesystems differently (for instance on a per-file type or a wildcard match basis) is currently impossible to address.
The present invention addresses these drawbacks and limitations of the traditional data volumes, by accommodating heterogeneous storage disks in a volume and making use of the storage characteristics of the disks to intelligently route data to/from the volume's disks. The present invention provides for improved I/O performance and reliability of the data volume, optimal usage of its capacity, and the capability to manage data differently and optimally, on a per-type-of-data basis.