In modern computer systems, a redundant array of independent disks (RAID) is a system to store data across multiple disk drives that are combined into a single logical unit. Data to be stored in the RAID system is distributed across these multiple disk drives according to a particular level that is employed, such as data replication or data division. The standard RAID levels are zero (0) through six (6). Standard RAID levels two (2) through six (6) use a particular error protection scheme implemented through parity. RAID levels 2 through 5 feature a single parity, whereas RAID 6 features two separate parities.
A RAID storage system can be implemented as a log-structured system. In a log-structured storage system, existing and valid data on disk is not overwritten; rather, new data is written to a new location each time. A log-structured system accomplishes this by treating a disk as a single “log” and appending data to the end of the log. Free space is managed on the disk by “cleaning”—that is, recovering out-of-date portions of the log.
In a log-structured RAID storage system, read-modify-write disk operations may require only a partial stripe. Such cases incur additional overhead because the data and parity information from the disk must be read, modified and written back to disk to complete the write operation. Furthermore, partial stripe writes often lead to data corruption during system failures because data and parity update operations can be interrupted in unpredictable ways. One common solution is to buffer changes to a given stripe on non-volatile memory (e.g., non-volatile random access memory such as battery backed random access memory (RAM) and flash memory) before issuing a partial stripe write operation. However, using non-volatile memory increases the cost to the system and complicates the design, particularly in the case of highly available systems. Additionally, non-volatile memory does not necessarily ensure reliability and data integrity.