1. Field of the Invention
The present invention relates to the control of multiple disk drives within computer systems and more particularly to a more efficient method for recovering data stored on a drive in a mass storage disk drive array subsystem for a personal computer system.
2. Description of the Related Art
With the ever increasing amount of data being processed by today's computer systems, it is often desirable to have a mass storage subsystem to transfer large amounts of data to and from the computer system. Such a mass storage subsystem is commonly found in a local area network (LAN), wherein information and files stored on one computer, called a server, are distributed to local work stations having limited or no mass storage capabilities. Both its storage capacity and data transfer rate measure the mass storage subsystem's ability to meet the demands of the LAN. The need for very high data transfer rates can be readily appreciated given the high performance requirements of video graphic work stations used in computer aided design and animation work.
In order to achieve a mass storage subsystem with a high data transfer rate, a disk drive array subsystem, hereinafter referred to as a drive array, was developed in which multiple standard disk drives were "ganged" together in order to effect the parallel transfer of data to or from the drives of the drive array to system memory. This type of drive array is commonly referred to as a Redundant Array of Inexpensive Disks (RAID). If n drives are grouped together, then the effective data transfer rate is increased n times. This technique, called "striping," originated in the supercomputing environment where the transfer of large amounts of data to and from secondary storage is a frequent requirement. With this approach, the end physical drives become one or more logical volumes and may be implemented either through hardware or software.
Although a drive array has a much faster data transfer rate than a single physical drive with the same storage capacity, the overall drive failure rate of a n drive, drive array is n times that of the single high capacity drive; therefore, data protection configurations were developed to enhance the data integrity of the drive array. One such data protection configuration is known as mirroring, or alternatively as RAID1, wherein each main drive of the drive array has a duplicate drive, referred to as a mirror drive. A stripe consists of main logical blocks each having a duplicate mirror logical block. Thus, if the data in a main logical block becomes corrupted, the correct main logical block can be recovered from its associated mirror logical block.
Because the RAID1 architecture requires a duplicate drive for each main drive, drive array designers developed data protection configurations employing parity protection which only require one additional drive. One such system is known as RAID4. A RAID4 configuration employs mapping in which data are stored across all but one drive in a stripe. The remaining drive is the parity drive and it contains the parity a XOR value of the data blocks in the stripe. The stripe consists of n data logical blocks and one logical parity block, wherein each drive provides a logical block of the stripe. The logical blocks, which include one or more disk sectors, are the same size for each stripe size. The stripe size can vary within the drive array and within the logical volume itself. A write operation to a logical volume consists of either writing all data logical blocks of a stripe to the logical volume or writing less than all data logical blocks of the stripe to the logical volume. The former is known as a full stripe write, and the latter is known as a partial stripe write. The parity logical block must be updated regardless if a partial or a full stripe write occurs. The parity logical block is created using an exclusive-or (XOR) technique as known to those skilled in the art. Should the data in one logical block become corrupted, a correct logical block can be regenerated from the other logical blocks using the known XOR technique.
The configuration of physical drives which comprise the drive array is transparent to the computer system. Instead, the host computer system addresses the striped data in the drive array by logical volume and logical block number. Each logical volume includes one or more stripes of data. Logical blocks belonging to several logical volumes might use one physical drive.
There are many variations of the parity and mirror fault tolerant data protection schemes described above. Another parity fault tolerant data protection scheme mapping is RAID5 which does not use a dedicated physical parity drive as in RAID4, but rather the parity logical blocks are interleaved with the data logical blocks among the n drives of the drive array. A write operation to the drive array always generates new parity information. Thus, for every write operation, the dedicated parity drive of the RAID4 data protection scheme must be accessed. The RAID5 data protection system accesses the physical drives more evenly. Additionally, another data protection scheme is known informally as RAID10, wherein each of the main physical drives in a RAID5 system has a mirror drive.
The above-mentioned fault tolerant data protection schemes employed in drive arrays, while promoting the stripe data integrity of the drive array, also provide a means for rebuilding a physical replacement drive should one of the main physical drives of the drive array fail. The remaining main drives of the array provide the information necessary to rebuild the physical replacement drive from the remaining drives. For a RAID1 system, the physical replacement drive can be rebuilt from the failed disk drive's mirror drive. For a RAID4 or RAID5 system, a logical block on the replacement drive is rebuilt by XORing the other logical blocks of its stripe, and this process is repeated until all logical blocks of the replacement drive are rebuilt.
In a computer system employing the drive array, it is desirable that the drive array remain on-line should a physical drive of the drive array fail. Such is the case for the LAN. If a main physical drive should fail, drive arrays currently have the capability of allowing a spare physical replacement drive to be rebuilt without having to take the entire drive array off-line. Furthermore, intelligent drive array subsystems currently exist which can rebuild the replacement drive transparent to the computer system and while the drive array is still otherwise operational. Such a system is disclosed in Schultz et al., U.S. Pat. No. 5,101,492, entitled "Data Redundancy and Recovery Protection," which is hereby incorporated by reference.
Time is critical when rebuilding a physical drive of a drive array because if another main physical drive fails during the rebuilding process, all of the data stored may be lost. Thus, it is desirable to minimize the rebuild time of the physical replacement drive in order to improve the data integrity of the drive array.
Although it is desirable to rebuild a physical drive in a timely and efficient manner while the remainder of the drive array is still operational, the ongoing rebuild operation must compete with system requests, especially those system requests requiring the access of logical volumes that are fully operational. The drive array must process system requests along with internal requests generated by rebuilding operations. Thus, it would be desirable for a user of the computer system to have the capability to adjust the priority of the rebuild operations of the drive array, thereby assigning the rebuild operations of the drive array lower priority during peak computer system usage times and higher priority during times of reduced computer system activity.