1. Field of the Invention
This invention relates to servicing a data storage device and more particularly relates to selective servicing of stripe groups in a redundant array of independent disks (“RAID”) system.
2. Description of the Related Art
A data storage system typically stores data to and retrieves data from one or more data storage devices. The data storage devices are typically hard disk drives, optical storage devices, magnetic tape storage devices, or semiconductor-based storage devices such as Flash dynamic random access memory (“DRAM”). One type of the data storage system is a RAID system. The RAID system stores redundant sets of data on one or more data storage devices. The RAID data storage devices are typically hard disks. The RAID hard disks may be organized into a plurality of stripes. Each stripe is a physical area of the hard disk configured to receive and store data. Stripes may be organized into stripe groups to form one or more logical disks from a plurality of hard disks. A first stripe on a first hard disk may be associated with a second stripe on a second hard disk to form the stripe group. A stripe group may allow a logical disk to comprise stripes from a plurality of hard disks. Any number of stripes from any number of hard disks may form the stripe group although a stripe group typically comprises one stripe from each hard disk. A stripe group may include redundant data such as parity data to allow the data from a failed stripe to be recovered.
FIG. 1 is a schematic block diagram illustrating one embodiment of a RAID array 100. The RAID array 100 includes four hard disks, hard disk one (1) 105a, hard disk two (2) 105b, hard disk three (3) 105c, and hard disk four (4) 105d. Each hard disk 105 comprises a plurality of stripes 110, 120, 130, or 140. For example, hard disk one (1) 105a comprises four stripes, stripe zero (0) 110a, stripe three (3) 110b, stripe six (6) 110c, and stripe P3. The stripes on the group of hard disks 105 are organized into stripe groups as illustrated in Table 1.
TABLE 1StripesStripeHardHardHardHardGroupDisk 1Disk 2Disk 3Disk 40012P0134P1526P2783P39AB
The depicted RAID array 100 is a parity RAID array with parity stripes 110d, 120c, 130b, 140a comprising redundant data for each stripe group. The data of a stripe 110, 120, 130, 140, of a stripe group may be recovered using the parity stripes 110d, 120c, 130b, 140a. Stripe group zero (0) comprises stripe zero (0) 110a of hard disk one (1) 105a, stripe one (1) 120a of hard disk two (2) 105b, stripe two (2) 130a of hard disk three (3) 105c, and stripe P2 or parity stripe two 140a of hard disk four (4) 105d. Stripe group one (1) comprises stripe three (3) 110b of hard disk one (1) 105a, stripe four (4) 120b of hard disk two (2) 105b, stripe P1 or parity stripe two 140a of hard disk three (3) 105c, and stripe five (5) 140b of hard disk four (4) 105d. Stripe group two (2) comprises stripe six (6) 110c of hard disk one (1) 105a, stripe P2 or parity stripe two 120c of hard disk two (2) 105b, stripe seven (7) 130c of hard disk three (3) P3 or parity stripe eight (8) 140c of hard disk four (4) 105d. Stripe group three (3) comprises stripe P3 or parity stripe three 110d of hard disk one (1) 105a, stripe nine (9) 120d of hard disk two (2) 105b, stripe A 130d of hard disk three (3) 105c, and stripe B 140d of hard disk four (4) 105d. Each stripe group may form a logical disk in the RAID array 100. In an alternate embodiment, the RAID array 100 may be a mirrored RAID array, with each first stripe of a mirrored or copied to a second stripe. Typically the first stripe and the mirrored second stripe reside on different hard drives 105.
In some prior art arrangments, the RAID array 100 is included in a RAID system (not shown). In addition to storing and retrieving data, the RAID system performs one or more service processes on hard disks 105. Service processes may include initializing a hard disk 105, rebuilding data on a replacement hard disk 105, and checking the consistency of the redundant data on the hard disk 105. The RAID system typically performs the service process on each stripe group.
In one example, the RAID system performs the service process of initializing a hard disk in a parity RAID array 100 by performing an exclusive or (“XOR”) operation on all data stripes of each stripe group to generate parity data for the parity stripes 110d, 120c, 130b, 140a of each stripe group. The parity stripe 110d, 120c, 130b, 140a is stored on a hard disk 105. For example, the RAID system may XOR stripe zero (0) 110a, stripe one (1) 120a, stripe two (2) 130a and write the resulting parity data to stripe P0 140a to initialize stripe group zero (0). The RAID system typically initializes each stripe group. Subsequent to completion, the initialization service process allows any data within the stripe group to be updated along with new parity data on hard disks 105. The new parity data is generated by means of XORing the old data, old parity data and the new data. In one arrangement, the RAID system writes a pattern of binary zeros (0s) to each stripe of a stripe group during the initialization process.
In an alternate arrangement, the service process of initializing a mirrored RAID array consisting of two hard disks 105a, 105b entails copying or mirroring the data of each stripe of each stripe group from the first hard disk 105a to the second hard disk 105b. Thus if the RAID array 100 were a mirrored RAID array, stripe one (1) 120a would mirror stripe zero (0) 110a, stripe four (4) 120b would mirror stripe three (3) 110b, and the like.
The RAID system may also perform the service process of rebuilding a replacement disk that replaces a failed hard disk 105 of a RAID array. Rebuilding consists of regenerating and writing all of the data for each stripe of a failed hard disk to the replacement hard disk at the same relative address. For a parity RAID array 100, data from a failed hard disk 105 for each stripe group is regenerated by computing the XOR of the contents of corresponding stripes on the surviving hard disks 105 including the parity stripe. For example, if hard disk one (1) 105a failed, the RAID system may regenerate stripe zero (0) 10a of stripe group zero (0) from stripe one (1) 120a, stripe two (2), 130a, and parity stripe P4 140a. For a two-disk mirrored RAID array, replacement data is regenerated from the surviving mirrored hard disk 105. For example, if the RAID array 100 is a mirrored RAID array, the data of hard disk zero (0) 105a is regenerated from hard disk one (1) 105b. 
In a certain embodiment, the RAID system also checks the integrity of redundant data for each stripe group. For example, the consistency check service process on a mirrored RAID array makes sure that data on both drives of the mirrored pair is exactly the same. For the RAID array 100 configured as a parity RAID array, a consistency check service process calculates the parity data for each stripe group by XORing all the data from stripes of the stripe group and comparing the resultant parity against the stored parity stripe of the stripe group.
The RAID system typically performs a service process on consecutive stripe groups in a predefined sequence. The stripe groups are typically numbered, and the RAID system services stripe groups in an ascending numerical order. For example, the RAID system may perform the service process on stripe group zero (0) followed by performing the service process on stripe group one (1). A service process can proceed in the predefined sequence while allowing input/output (“I/O”) applications to access data on the array at stripe groups that have been serviced. If an I/O application requests access to data in a stripe group that has not completed a service process such as initialization, the RAID system has typically pursued one of two options. One option is to delay the execution of the I/O application until the service process is complete for the associated stripe group. This delay is understandable, as it slows the performance of the data processing system to an unacceptable extent.
The other currently available option is for the RAID system to re-direct the service process to the stripe group containing data requested by the I/O application and delay the data access of the I/O application until the service process is completed for the re-directed stripe group. After the service process is completed for the re-directed stripe group, the service process may continue as if there had not been a re-direction, resuming the service process for the next consecutive stripe group.
For example, if the I/O application receives an I/O command to access a data block in stripe group six (6) and the service process for stripe group six (6) is not complete, the RAID system may re-direct the service process to stripe group six (6) even though the process has only completed stripe group one (1). The I/O application accesses stripe group six (6) containing the requested data block subsequent to the completion of the service process on stripe group six (6). Unfortunately, the RAID system does not track re-directed service processes for stripe groups. As a result, if an I/O application attempts to repeatedly access stripe group 6, the RAID system will repeatedly re-direct the service process to stripe group six (6) until the service process is complete for stripe group six (6) in the services process's normal predefined sequence. Consequently, the RAID system performs the service process on one or more stripe groups multiple times, slowing performance.
From the foregoing discussion, it should be clear that a need exists for an apparatus, system, and method that track the completion of a service process for each stripe group and avoid repeating the service process for a stripe group. In addition, the apparatus, system, and method should allow concurrent access to a stripe group of the RAID array even if the access encounters a delay pending the completion of the re-directed service process. Beneficially, such an apparatus, system, and method would reduce the time a stripe group is inaccessible, thereby improving data storage system performance.