In a RAID disk storage array, data is written in “stripes” across the drives of the disk array so that subsequent accesses of the data will be able to take advantage of the combined transfer rate of the drives of the array for “large” accesses. Since the smallest addressable unit of storage for a disk drive typically is the sector, a stripe will consist of at least one sector per drive. For RAID-3 and RAID-5 configurations, a redundancy pattern is computed across the stripe and stored along with the data to enable error checking and correction, even in the event of a drive failure.
To illustrate, FIG. 1A is a conceptual diagram of a disk storage array, in which five drives are shown (Drive 0-Drive 4), although the number of drives is not critical. Each individual square in the figure represents one block of data—in the generic sense of any predetermined unit of storage. The drives labeled 0-3 are data drives. Drive 4 is the parity or redundancy drive in this configuration, generally known as RAID-3. Data is “striped” over the data drives. This means that for any selected stripe width, consecutive data blocks of that size, for example one sector, are stored in sequence across consecutive data drives. This sequence is indicated by the arabic numbers in each storage block.
The bit-by-bit exclusive-OR function of the four data blocks (for example, 0-3) that make up each stripe is stored in the corresponding block of the parity drive. This exclusive-OR notation in FIG. 1 is “X[A:B]” indicating the exclusive-OR of the blocks of user data beginning with A and ending with B. Thus, for example, the XOR function for blocks 4-7 is shown in Drive 4 as “X[4:7]”. Using this RAID-3 configuration, the contents of a block of data on any failed drive can be reconstructed by computing the exclusive-OR of the remaining blocks of its stripe including the parity block. “On the fly” reconstruction of data is taught in commonly-assigned U.S. Pat. No. 6,237,052 —hereby incorporated by reference. U.S. Pat. No. 6,237,052, however, does not address the problem of updates to data that affect less than one stripe.
FIG. 1B is similar to FIG. 1A except that the parity data is distributed over all of the drives of the array, thereby creating a RAID-5 configuration. The RAID-5 organization typically is used for systems in which the parity writes to a single drive would create a performance bottleneck. The RAID-5 configuration allows all of the drives of the array to participate concurrently in the parity write problem, thereby easing the bottleneck.
U.S. Pat. No. 5,805,788 describes RAID-5 parity generation and data reconstruction in greater detail. In particular, it discloses a “brute force” method comprising reading the data from a local buffer, computing the parity, and then writing the result back to the buffer. That methodology has limited application, however, because buffer bandwidth tends to be the bottleneck in systems that have a fast host bus and a large array of drives.
U.S. Pat. No. 6,233,648 is entitled “Disk Storage System And Data Update Method Used Therefor.” This patent discloses a disk write method in which updates, which are not necessarily blocks of contiguous data, are accumulated until there is a convenient amount (e.g., a stripe), and then the accumulated data is written as a block to a new area on the array. While this technique makes writes very efficient, read operations require a special table to find the data.
In any disk storage array, when only a portion of a stripe of data is updated by the host system (a “partial-stripe update”), the balance of the stripe must be accessed from the drives (essentially a read operation), so that a new redundancy pattern can be correctly computed on the entire updated stripe. In prior art, a buffer is allocated (typically in RAM) in which to assemble the new stripe. Updated data is written from the host into the buffer. In the buffer, sectors corresponding to the data updated by the host are valid, while the contents of the remaining sectors of the stripe are temporarily undefined.
The disk array controller further allocates a second buffer (also typically in RAM), into which it reads the current contents of the entire stripe from the drives. The controller then copies all of the sectors which had not been updated by the host, from the second buffer (disk image stripe buffer) to the first buffer (the new stripe buffer), where they are merged with the updated data from the host to complete the updated stripe. At this point, the first stripe buffer will contain all valid data, with new sectors from the host and current (old) sectors from the drives. An updated redundancy can now be computed.
Ideally, a stripe buffer in the controller would be written once by the host and read once in order to write to the disk array. For the partial-stripe update scenario just described, however, in addition to the normal read and write of the buffer, an additional operation is required to access the current contents of the stripe, and additional reads and writes are required to copy those sectors which were not updated by the host, as described above. These problems increase the frequency of disk access and negatively impact disk array performance. What is needed is more efficient methods and apparatus for processing partial-stripe updates to data stored in a disk storage array such as a RAID system.