RAID (Redundant Array of Independent/Inexpensive Disks) is an organization of data on a plurality of disks to achieve varying levels of availability and performance. One performance enhancing feature of RAID is "striping" which spreads data across the disks in the array. Each disk in the RAID array is referred to as a member of the array. Furthermore, while disks are referred to throughout, any equivalent storage media could be used as would be apparent to one of ordinary skill in the field. The data is broken down into segments referred to as "chunks." A chunk is a group of consecutively numbered blocks that are placed consecutively on a single disk before placing blocks on a different disk. A block is the smallest unit of data that can be read or written to a disk. Thus, a chunk is the unit of data interleaving for a RAID array. For example, in a four disk RAID array the first chunk is placed on the first disk, the second chunk is placed on the second disk, the third chunk is placed on the third disk, the fourth chunk is placed on the fourth disk, the fifth chunk is placed on the first disk and so on. This spreading of data increases performance through load balancing.
RAID enhances availability of data through data redundancy. In a RAID level 4 (RAID-4) and RAID level 5 (RAID-5) data redundancy is achieved by "parity." Parity involves the use of error correction codes (ECC) such as Exclusive-OR or Reed-Solomon. Parity data is stored in the RAID array and is used to reconstruct the data if a disk fails or a data block otherwise becomes unavailable.
As is well known, there are several levels of RAID, each of which has different characteristics that affect performance and availability. One common aspect of all RAID levels is that each array appears as one large virtual disk to the user. RAID storage systems can be implemented in hardware or software. In the hardware implementation the RAID algorithms are built into a controller that connects to the computer I/O bus. In the software implementation the RAID algorithms are incorporated into software that runs on the main processor in conjunction with the operating system. In addition, the software implementation can be affected through software running on a well-known RAID controller. Both the hardware and software implementations of RAID are well known to those of ordinary skill in the field.
RAID level 4 (RAID-4) and RAID level 5 (RAID-5) are organizations of data for an array of n+1 disks that provide enhanced performance through the use of striping and enhanced data availability through the use of parity. A parity block is associated with every n data blocks. The data and parity information is distributed over the n+1 disks so that if a single disk fails, all of the data can be recovered. RAID-4 is a level of organization of data for a RAID array where data blocks are organized into chunks which are interleaved among the disks and protected by parity and all of the parity is written on a single disk. RAID-5 is a level of organization of data for a RAID array where data blocks are organized in chunks which are interleaved among the disks and protected by parity and the parity information is distributed over all of the disks in the array. In both RAID-4 and RAID-5 the ensemble or array of n+1 disks appears to the user as a single, more highly available virtual disk.
The contents of each bit of the parity block is the Exclusive-OR of the corresponding bit in each of the n corresponding data blocks. In the event of the failure of a single disk in the array, the information from a given data or parity block on the failed disk is regenerated by calculating the Exclusive-OR of the contents of the corresponding blocks on the surviving disks. A block or set of blocks is repaired by writing the regenerated data. The regeneration and repair of data for a data block or set of data blocks on a disk in a RAID array is referred to as reconstruction.
When a disk in the RAID array fails, it can be replaced with a new disk and the contents of the failed disk reconstructed using the standard RAID algorithms and the contents of the other disks. In this manner, the RAID array with the replacement disk is restored to its fully redundant state without the loss of application data. Under some circumstances a failed disk in the RAID array cannot be reconstructed or replaced promptly. During the time that the failed disk remains out of the RAID array the cost of operations increases and performance and reliability decrease. Accordingly, if the storage system is to operate for any period of time with a failed disk of the RAID array it is desirable to improve performance and reliability and prevent an increase in the cost of operation. One method of achieving these goals is described herein and in a co-pending application Ser. No. 08/084,370, filed Jun. 29, 1993, titled Method for Reorganizing the Data on a RAID-4 or RAID-5 Array in the Absence of One Disk by Joseph F. Krantz, which is incorporated herein by reference and described hereinafter. In a RAID array with a failed disk the data is reorganized one strip at a time by regenerating the unavailable data using standard RAID algorithms and writing the regenerated data over the parity information. The process of reorganizing the data is referred to as "folding" and the RAID array with the reorganized data is referred to as "fully folded." After the failed disk is replaced the fully folded RAID array is returned to its original or normal RAID organization by an "unfolding" process. The unfolding process realigns the data to its original position in the strip and calculates the parity information and writes the parity information to its original chunk of the strip.
The folding and unfolding processes are easy to implement provided concurrent application access to the array is inhibited. However, preventing application access to the RAID array is undesirable. Accordingly, it is desirable to complete the above folding and unfolding operations while permitting concurrent application access to the array.