1. Field of the Invention
The present invention relates to the control of multiple disk drives within computer systems and more particularly to a more efficient method for recovering data stored on a drive in a mass storage disk drive array subsystem for a personal computer system.
2. Description of the Related Art
With the ever increasing amount of data being processed by today""s computer systems, it is often desirable to have a mass storage subsystem to transfer large amounts of data to and from the computer system. Such a mass storage subsystem is commonly found in a local area network (LAN), wherein information and files stored on one computer, called a server, are distributed to local work stations having limited or no mass storage capabilities. Both its storage capacity and data transfer rate measure the mass storage subsystem""s ability to meet the demands of the LAN. The need for very high data transfer rates can be readily appreciated given the high performance requirements of video graphic work stations used in computer aided design and animation work.
In order to achieve a mass storage subsystem with a high data transfer rate, a disk drive array subsystem, hereinafter referred to as a drive array, was developed in which multiple standard disk drives were xe2x80x9cgangedxe2x80x9d together in order to effect the parallel transfer of data to or from the drives of the drive array to system memory. This type of drive array is commonly referred to as a Redundant Array of Inexpensive Disks (RAID). If n drives are grouped together, then the effective data transfer rate is increased n times. This technique, called xe2x80x9cstriping,xe2x80x9d originated in the supercomputing environment where the transfer of large amounts of data to and from secondary storage is a frequent requirement. With this approach, the end physical drives become one or more logical volumes and may be implemented either through hardware or software.
Although a drive array has a much faster data transfer rate than a single physical drive with the same storage capacity, the overall drive failure rate of a n drive, drive array is n times that of the single high capacity drive; therefore, data protection configurations were developed to enhance the data integrity of the drive array. One such data protection configuration is known as mirroring, or alternatively as RAID1, wherein each main drive of the drive array has a duplicate drive, referred to as a mirror drive. A stripe consists of main logical blocks each having a duplicate mirror logical block. Thus, if the data in a main logical block becomes corrupted, the correct main logical block can be recovered from its associated mirror logical block.
Because the RAID1 architecture requires a duplicate drive for each main drive, drive array designers developed data protection configurations employing parity protection which only require one additional drive. One such system is known as RAID4. A RAID4 configuration employs mapping in which data are stored across all but one drive in a stripe. The remaining drive is the parity drive and it contains the parity a XOR value of the data blocks in the stripe. The stripe consists of n data logical blocks and one logical parity block, wherein each drive provides a logical block of the stripe. The logical blocks, which include one or more disk sectors, are the same size for each stripe size. The stripe size can vary within the drive array and within the logical volume itself. A write operation to a logical volume consists of either writing all data logical blocks of a stripe to the logical volume or writing less than all data logical blocks of the stripe to the logical volume. The former is known as a full stripe write, and the latter is known as a partial stripe write. The parity logical block must be updated regardless if a partial or a full stripe write occurs. The parity logical block is created using an exclusive-or (XOR) technique as known to those skilled in the art. Should the data in one logical block become corrupted, a correct logical block can be regenerated from the other logical blocks using the known XOR technique.
The configuration of physical drives which comprise the drive array is transparent to the computer system. Instead, the host computer system addresses the striped data in the drive array by logical volume and logical block number. Each logical volume includes one or more stripes of data. Logical blocks belonging to several logical volumes might use one physical drive.
There are many variations of the parity and mirror fault tolerant data protection schemes described above. Another parity fault tolerant data protection scheme mapping is RAID5 which does not use a dedicated physical parity drive as in RAID4, but rather the parity logical blocks are interleaved with the data logical blocks among the n drives of the drive array. A write operation to the drive array always generates new parity information. Thus, for every write operation, the dedicated parity drive of the RAID4 data protection scheme must be accessed. The RAID5 data protection system accesses the physical drives more evenly. Additionally, another data protection scheme is known informally as RAID10, wherein each of the main physical drives in a RAID5 system has a mirror drive.
The above-mentioned fault tolerant data protection schemes employed in drive arrays, while promoting the stripe data integrity of the drive array, also provide a means for rebuilding a physical replacement drive should one of the main physical drives of the drive array fail. The remaining main drives of the array provide the information necessary to rebuild the physical replacement drive from the remaining drives. For a RAID1 system, the physical replacement drive can be rebuilt from the failed disk drive""s mirror drive. For a RAID4 or RAID5 system, a logical block on the replacement drive is rebuilt by XORing the other logical blocks of its stripe, and this process is repeated until all logical blocks of the replacement drive are rebuilt.
In a computer system employing the drive array, it is desirable that the drive array remain on-line should a physical drive of the drive array fail. Such is the case for the LAN. If a main physical drive should fail, drive arrays currently have the capability of allowing a spare physical replacement drive to be rebuilt without having to take the entire drive array off-line. Furthermore, intelligent drive array subsystems currently exist which can rebuild the replacement drive transparent to the computer system and while the drive array is still otherwise operational. Such a system is disclosed in Schultz et al., U.S. Pat. No. 5,101,492, entitled xe2x80x9cData Redundancy and Recovery Protection,xe2x80x9d which is hereby incorporated by reference.
Time is critical when rebuilding a physical drive of a drive array because if another main physical drive fails during the rebuilding process, all of the data stored may be lost. Thus, it is desirable to minimize the rebuild time of the physical replacement drive in order to improve the data integrity of the drive array.
Although it is desirable to rebuild a physical drive in a timely and efficient manner while the remainder of the drive array is still operational, the ongoing rebuild operation must compete with system requests, especially those system requests requiring the access of logical volumes that are fully operational. The drive array must process system requests along with internal requests generated by rebuilding operations. Thus, it would be desirable for a user of the computer system to have the capability to adjust the priority of the rebuild operations of the drive array, thereby assigning the rebuild operations of the drive array lower priority during peak computer system usage times and higher priority during times of reduced computer system activity.
The present invention relates to a new and improved rebuild algorithm and apparatus for rebuilding a physical replacement drive in a fault tolerant drive array. In the preferred embodiment of the present invention, a local processor of the drive array reads a stripe from a logical volume of the drive array that uses the physical replacement drive. The local processor then checks the stripe for consistency. If the stripe is inconsistent, the local processor sequentially rebuilds a predetermined number of stripes beginning with the checked stripe; however, if the checked stripe is consistent, then the local processor does not rebuild the stripe, but instead the local processor sequentially checks a next stripe for consistency, wherein the above-described process is repeated. Because the present invention reduces the number of required writes to the drive array, the rebuilding time of the physical replacement drive is decreased, thereby improving the data integrity of the drive array.
The present invention also relates to a method for selecting priority between execution of physical requests originating from system requests and execution of physical requests originating from background drive array operations. In the preferred embodiment of the present invention, a user of the computer system can utilize a priority command which includes a pause interval parameter and a pause duration parameter. The priority command is submitted to the drive array wherein the local processor parses the priority command in order to determine the pause interval parameter and the pause duration parameter.
The pause duration and pause interval parameters define rebuild priority. The local processor executes a foreground task which parses a logical command list submitted by the host processor into host logical requests. In the foreground task, the local processor executes the host logical requests, thereby forming physical requests from each host logical request. In the preferred embodiment of the present invention, the local processor, after delaying for the time specified by the pause interval parameter, then delays execution of the foreground task, the time of which is set by the pause duration parameter. This delay allows more physical requests generated by background disk operations to be processed. In the preferred embodiment of the present invention, the background disk operations include rebuild operations. When the foreground task is delayed, the local processor processes more physical requests submitted by rebuild operations, thereby effectively advancing the priority of the rebuild operations. Thus, the user of the computer system can adjust the priority between rebuild operations and computer system requests.