This invention relates to methods of tracking incomplete writes in a disk array, and disk storage system which perform such methods.
In the prior art, the term "RAID" disk array has been defined to mean any Redundant Array of Inexpensive Disks; and several different RAID disk arrays have been defined. These include a Level One RAID disk array, a Level Three RAID disk array and a Level Five RAID disk array. See "A Case for Redundant Arrays of Inexpensive Disks (RAID)" by Patterson, et al., Report No. UCB/CSD 87/391, December 1987, Computer Science Division of the University of California at Berkeley.
With a Level Five RAID disk array, both parity and data are striped across a set of several disks. FIG. 1 shows one example of a Level Five RAID disk array in which the array resides on a set of five disks that are labeled Disk 0, Disk 1 . . . Disk 4. Each column of the array contains data and parity which is stored in a single disk of the set. Each row of the array contains data and parity which are striped across all five disks of the set.
In FIG. 1, each row of the array consists of one parity chunk which resides on one disk, and four data chunks which reside on four other disks. Also, each data chunk and each parity chunk is partitioned into several physical blocks. A single block is the smallest portion of a chunk that can be separately addressed by a user program with a read or write command. In FIG. 1, there are eight blocks per chunk. Each block consists of a predetermined number of bytes (e.g.--512 bytes) plus one cyclic redundant check byte called the "CRC" byte.
In the FIG. 1 array, block 0 in row 0 is addressed by a read/write command with a logical address of 0. As this logical address is sequentially incremented by one, the data blocks are addressed the following order: blocks 1-7 of data chunk 0, blocks 0-7 of data chunk 1, blocks 0-7 of data chunk 2, blocks 0-7 of data chunk 3, blocks 8-15 of data chunk 4, blocks 8-15 of data chunk 5, etc. For example, block 8 of data chunk 5 has a logical address of 40.
When a block of data is written, the CRC byte within that block is also generated and written. Further, the parity block which has the same block number as the data block is also generated and written. This parity block is written using odd parity or even parity.
With even parity, the exclusive-or of a parity block and all data blocks that have the same block number produces a block of all "0's". Conversely, with odd parity, the exclusive-or of a parity block and all data blocks that have the same block number produce a block of all "1's".
One way to generate the new parity block for a new data block that is to be written is as follows. First, the existing data block and its parity block are read from their respective disks. Then the new parity is calculated as the parity block which was read exclusive-or'd with the data block which was read exclusive-or'd with the new data block. This new parity block and the new data block are then written on their respective disks.
During the execution of a read command, the CRC byte is regenerated from the block of data that is read. If the regenerated CRC byte differs from the stored CRC byte, then the block of data which is read contains an error. To correct this error, the erroneous data block is regenerated by a) reading all of the other blocks (data and parity) on the disks which have the same block number as the erroneous data block; and b) exclusive-oring those blocks together.
Consider now the case where the execution of a particular write command is started which attempts to write data into a block having a particular block number "i", but that execution is interrupted before it is completed. Such an interruption can occur, for example, due to a power failure.
In the above case, the interruption can occur after the writing of the new data block is completed but before the writing of the new parity block has begun. Similarly, the interruption can occur after the writing of the new parity block has completed but before the writing of the new data block has begun. In either case, the exclusive-or of all blocks having block number "i" will not equal a block of all "0's" or all "1's". At the same time, the ECC byte for every block with block number "i" will be correct.
After the cause of the interruption is fixed, the array will continue to be read and written by the user programs. If any data block having block number "i" is read and the ECC byte detects an error, then an attempt will be made to regenerate the erroneous data block by exclusive-oring together the remaining data blocks and parity block with the same block number. But, due to the prior incomplete write, that regeneration process will not work.
This problem was addressed in the prior art by providing a flag in the disk array which was set to "1" when the array started to run, and reset to "0" when the array stopped running in a normal fashion. Thus, if the flag was found to be "1" before it was set when the array started to run, the normal operation of the array must have previously been interrupted.
However, a drawback with the above prior art flag is that after the flag is found to be in the incorrect state, it takes too long to identify the particular block that is incompletely written. To find the incompletely written block, every data block and every parity block in the entire array must be read; then parity blocks must be recalculated from the read data blocks; and then the recalculated parity blocks must be compared to the read parity blocks. For large arrays, this process can take over a day to complete.
Accordingly, a primary object of the present invention is to provide a method of tracking incomplete writes in a disk array and a disk storage system which performs such method, which eliminate the need to check parity for every block in the entire array after the normal operation of the array has been interrupted.