Disk array data storage systems have multiple storage disk drive devices which are arranged and coordinated to form a single mass storage system. There are three primary design criteria for such storage systems: cost, performance, and availability. It is most desirable to produce memory devices that have a low cost per megabyte, a high input/output performance, and high data availability. "Availability" is the ability to access data stored in the storage system and the ability to insure continued operation in the event of some failure. Typically, data availability is provided through the use of redundancy wherein data, or relationships among data, are stored in multiple locations. In the event that a storage disk in the disk array partially or completely fails, the user data can be reconstructed via the redundant data stored on the remaining disks.
There are two common methods of storing redundant data. According to the first or "mirror" method, data is duplicated and stored in two separate areas of the storage system. For example, in a disk array, the identical data is provided on two separate disks in the disk array. The mirror method has the advantages of high performance and high data availability due to the duplex storing technique. However, the mirror method is also relatively expensive as it effectively doubles the cost of storing data.
In the second or "parity" method, a portion of the storage area is used to store redundant data, but the size of the redundant storage area is less than the remaining storage space used to store the original data. For example, in a disk array having five disks, four disks might be used to store data with the fifth disk being dedicated to storing redundant data. The parity method is advantageous because it is less costly than the mirror method, but it also has lower performance and availability characteristics in comparison to the mirror method.
This invention is particularly directed toward storing data according to parity techniques. In conventional disk arrays, the space on the storage disks are configured into multiple stripes where each stripe extends across the storage disks. Each stripe consists of multiple segments of storage space, where each segment is a portion of the stripe that resides on a single storage disk in the disk array.
During initialization of a prior art disk array, the storage disks are formatted and the parity for each stripe is set. After initialization, four I/O accesses are required to write data to the disk array: a first I/O to read the data to be updated from a selected stripe, a second I/O to read the corresponding parity for data in that stripe, a third I/O to write new data back to the stripe, and a fourth I/O to write a new parity that accounts for the new data back to the stripe. It would be desirable to reduce the number of I/Os required to write data to stripes in disk arrays.
One technique that has been used in some prior art disk arrays is to cache the parity values. This reduces the need to read the parity from the disk array during each write process, thereby reducing the number of I/Os to three. However, there remains a need to further reduce the number of I/Os required to write data to stripes in the disk array.