Redundant Array of Independent Disks (abbr. RAID) is the most widely used approach in the storage field to improve the system reliability, which configures multiple physical storage devices with a certain algorithm and uses as a single, high-performance, reliable virtual volume, e.g., configuring two hard disks into RAID-1, configuring four or five disks into RAID-4, RAID-5 and RAID-6.
RAID controller is used to manage physical storage array and enables it to work as a logical unit, and protect the data on the physical storage array. The system can view the logical drives via RAID controller without need of managing them directly. The functionality of RAID controller can be implemented as either hardware or software. In a typical disk array, as the connection device between the host bus adapters and the hard disks, an RAID controller supports fast computation and input/output (I/O), and provides advanced management functionality including caching and extensibility.
RAID was first proposed in 1988. In recent years, the world has witnessed RAID becoming a multibillion-dollar industry. Researches on RAIDs usually focus on how to design and optimize an individual RAID volume. In practice, however, an RAID controller can enable up to hundreds of physical devices to work simultaneously and manage dozens of logical volumes for various applications. In this situation, a set of storage devices (most of them are homogeneous) will form a single RAID and work together as a volume. Different volumes have different sets of disks, which work independently from each other. Although volume-level management simplifies RAID management and provides performance and fault isolation, it neglects the potential for system-wide optimization.
There exist various kinds of storage devices, such as traditional Hard Disk Drive (abbr. HDD) and flash-based Solid State Disk (abbr. SSD). In recent years, several potential devices (e.g., memristor) are also developed. With different implementation mechanisms, these types of storage devices have their own feature from the perspective of technology and economics. For instance, compared to other devices, HDD has larger capacity as well as relatively lower price, but its performance of random reads and writes is poor. SSD has faster random read speed, smaller volume, higher price, and yet the erase-before-write and wear-out issues. Some technologies of mixing different types of storage devices are already proposed. However, they are usually restricted to optimizing an individual RAID.
As Flash-based SSDs continue to gain popularity in enterprise data centers, it is not uncommon to see an RAID controller with a mix of hard disks and SSDs, where a virtual volume may span over a set of hard disks and SSDs. Although an RAID of SSDs is able to deliver high IOPS (Input/Output Operations per Second), researchers have demonstrated that applying RAID algorithms onto SSDs directly cannot make full use of the performance characteristics of SSDs, and therefore have proposed some RAID algorithms specially designed for SSDs. However, those RAID algorithms are still restricted to optimizing an individual RAID. Furthermore, compared to an RAID made up of hard disks, an SSD RAID has much smaller storage capacity and higher price. In addition, due to the erase-before-write and wear-out issues in SSDs, it is more likely that failure occurs in multiple logical volumes in an enterprise disk array, which serves I/O requests from different types of applications.
In traditional storage systems, accessing parity information in an individual RAID usually becomes the performance bottleneck of storage systems. Therefore, in storage systems, especially in enterprise data centers, addressing the performance bottleneck problem of parity accesses in an individual RAID volume, without addition of any extra hardware I/O resources, is one key challenge.