For many years large-scale non-volatile data storage systems were based on arrays of magnetic disk drives. Such systems are increasingly being supplanted by systems based on arrays of flash memory modules. Flash memory is a solid-state, non-volatile storage medium that can be electrically erased and reprogrammed.
One limitation of flash memory is that, although it provides random-access read and programming operations, it does not provide random-access rewrite or erase operations. Rather, flash memory must be erased and then rewritten on a block basis, where a block consists of hundreds or thousands of bits. A flash memory controller can work around this limitation by marking data that is no longer needed as “invalid” and then, at such time as additional storage space may be needed, consolidating the data that is still needed into blocks and rewriting them. Such a process is sometimes referred to as garbage collection. A flash memory controller commonly maintains an address translation table that relates the physical memory address at which data is stored in a flash memory module to a logical address.
Another limitation of flash memory is that repeated erase and reprogramming cycles cause physical degradation of the memory cells. Thus, a memory cell that is erased and reprogrammed more times than another memory cell is likely to fail sooner than the other cell. For this reason, flash memory controllers attempt to distribute writes as evenly as possible across all blocks of cells.
It is important that a host system be able to reliably access all of the data in the data storage system. However, a potential problem that affects data storage systems is that one or more of the devices can fail or malfunction in a manner that prevents the host system from accessing some or all of the data stored on that device. A “redundant array of inexpensive disks” (also known as a “redundant array of independent disks”) or “RAID” is a common type of data storage system that addresses the foregoing reliability problem by enabling recovery from the failure of one or more storage devices.
Various RAID schemes are known. The various RAID schemes are commonly referred to by a “level” number, such as “RAID-0,” “RAID-1,” “RAID-2,” etc. As illustrated in FIG. 1, a storage array 10 in a conventional RAID-5 system can include, for example, four storage devices 12, 14, 16 and 18 (e.g., arrays of disk drives or flash memory modules). In accordance with the RAID-5 scheme, data blocks are distributed across storage devices 12, 14, 16 and 18. Distributing data blocks across multiple storage devices is known as striping. Parity information for the data blocks distributed among storage devices 12, 14, 16 and 18 in the form of a stripe is stored along with that data as part of the same stripe. For example, RAID storage controller 10 can distribute or stripe data blocks A, B and C across corresponding storage locations in storage devices 12, 14 and 16, respectively, and then compute parity information for data blocks A, B and C and store the resulting parity information P_ABC in another corresponding storage area in storage device 18. The sequential group of storage locations containing data blocks A, B and C and parity block P_ABC is referred to as a stripe.
A RAID data storage system commonly includes a storage controller that controls the manner and locations in which data is written to and read from the storage array. A storage controller can take the form of, for example, a circuit card, such as a PCIe card (Peripheral Computer Interconnect-Express) that plugs into a host computer motherboard. For example, as illustrated in FIG. 1, a storage controller 20 can include a processor 22 and local memory 24. Processor 22 is responsible for computing the parity information. In such computations and other operations, processor 20 utilizes local memory 24. To compute the parity in the foregoing example, processor 22 can read data blocks A, B and C from storage devices 12, 14 and 16, respectively, into local memory 24 and then performs an exclusive disjunction operation, commonly referred to as an Exclusive-Or (XOR), on data blocks A, B and C in local memory 24. Processor 22 then stores the computed parity P_ABC in data storage device 18 in the same stripe in which data blocks A, B and C are stored in data storage devices 12, 14 and 16, respectively.
The RAID-5 scheme employs parity rotation, which means that storage controller 20 does not write the parity information for each stripe to the same one of data storage devices 12, 14, 16 and 18 as the parity information for all other stripes. For example, as shown in FIG. 1, parity information P_DEF for data blocks D, E and F is stored on storage device 16, while data blocks D, E and F are stored in the same stripe as parity information P_DEF but on storage devices 12, 14 and 18, respectively. Similarly, parity information P_GHJ for data blocks G, H and J is stored on storage device 14, while data blocks G, H and J are stored in the same stripe as parity information P_GHJ but on storage devices 12, 16 and 18, respectively. Likewise, parity information P_KLM for data blocks K, L and M is stored on storage device 12, while data blocks K, L and M are stored in the same stripe as parity information P_KLM but on storage devices 14, 16 and 18, respectively.
In database systems, the order or sequence in which input/output or “I/O” operations occur is commonly random, i.e., not inherently predictable by systems external to a host system 26 (e.g., a computer). Host system 26, in association with issuing a write request, does not in every instance request that storage array 10 write a data block to a logical block address (LBA) immediately following an LBA to which host system 26 requested a data block be written in association with the immediately preceding write request. In the example shown in FIG. 1, the logical sequence of data blocks is indicated alphabetically, such that the LBA or logical storage location of data block B in storage array 10 immediately follows the LBA or logical storage location of data block A, the LBA or logical storage location of data block C immediately follows the LBA or logical storage location of data block B, etc. Therefore, host system 26 may, for example, issue a write request requesting that data block B be written to storage array 10 before host system 26 issues a write request requesting that data block A be written to storage array 10. Each time storage controller 20 writes a data block to a stripe in storage array 10, storage controller 20 must update the parity block of that stripe in the above-described manner, e.g., employing a read-modify-write operation in which processor 22 reads data blocks A, B and C from storage devices 12, 14 and 16, respectively, into local memory 24, then computes parity block P_ABC, and stores the computed parity block P_ABC in data storage device 18.
It is known to employ a data serialization feature in data storage systems so that data and parity blocks are written in a single “full-stripe” write operation. A full-stripe write feature can avoid the inefficiency or overhead stemming from the above-described read-modify-write operation. In a storage controller 20 employing a full-stripe write feature, storage controller 20 serializes data to be written to storage devices 12, 14, 16 and 18. That is, data is not written to a stripe until all of the data blocks and the parity block of that stripe are ready to be written to storage array 10. Then, storage controller 20 writes the data blocks and parity block together as a full stripe. For example, storage controller 20 can store data blocks A, B and C in memory 24, which serves as a cache, until all of data blocks A, B and C have been received from host system 26. Then, storage controller 20 can computer the parity block P_ABC and sequentially or serially write the blocks stripe consisting of data blocks A, B and C and parity block P_ABC.
In a system in which storage devices 12, 14, 16 and 18 comprise flash memory, employing a full-stripe write feature not only provides a performance benefit by avoiding a read-modify-write operation but also inhibits degradation of the flash memory cells by distributing write operations more evenly across the cells. In a storage controller 20 employing a full-stripe write feature, storage controller 20 manages an address translation table (not shown) to translate between the host addresses, i.e., the addresses that host 26 includes in write requests, and the storage locations or addresses at which the data blocks are ultimately stored in storage array 10. Processor 22 commonly retrieves the address translation table into memory 24 in portions, on an as-needed basis, from storage array 10. When storage controller 20 stores data in storage array 10 or otherwise modifies data in storage array 10, processor 22 modifies the relevant portion of address translation table accordingly.