This invention relates to storage arrays. More particularly, the invention relates to prioritizing I/O requests to improve host I/O performance and availability of a storage array during rebuild.
Conventional disk array data storage systems have multiple storage disk drive devices that are arranged and coordinated to form a single mass storage system. The common design goals for mass storage systems include low cost per megabyte, high input/output performance, and high data availability. Data availability involves the ability to access data stored in the storage system while ensuring continued operation in the event of a disk or component failure. Data availability is often provided through the use of redundancy where data, or relationships among data, are stored in multiple locations in the storage system. In the event of disk failure, redundant data is retrieved from the operable portion of the system and used to regenerate the original data that is lost due to the component failure.
There are two common methods for storing redundant data on disk drives: mirrored and parity. In mirrored redundancy, the data being stored is duplicated and stored in two separate areas of the storage system that are the same size (an original data storage area and a redundant storage area). In parity redundancy, the original data is stored in an original data storage area and the redundant data is stored in a redundant storage area, but because the redundant data is only parity data the size of the redundant storage area is less than the size of the original data storage area.
RAID (Redundant Array of Independent Disks) storage systems are disk array systems in which part of the physical storage capacity is used to store redundant data. RAID systems are typically characterized as one of seven architectures or levels, enumerated under the acronym RAID. A RAID 0 architecture is a disk array system that is configured without any redundancy. Since this architecture is really not a redundant architecture, RAID 0 is often omitted from a discussion of RAID systems.
A RAID 1 architecture involves storage disks configured according to mirrored redundancy. Original data is stored on one set of disks and a duplicate copy of the data is kept on separate disks. The RAID 2 through RAID 6 architectures all involve parity-type redundant storage. Of particular interest, a RAID 5 architecture distributes data and parity information across all of the disks. Typically, the disks are divided into equally sized address areas referred to as xe2x80x9cblocksxe2x80x9d. A set of blocks from each disk that has the same unit address ranges are referred to as xe2x80x9cstripesxe2x80x9d. In RAID 5, each stripe has N blocks of data and one parity block which contains redundant information for the data in the N blocks.
In RAID 5, the parity block is cycled across different disks from stripe-to-stripe. For example, in a RAID 5 architecture having five disks, the parity block for the first stripe might be on the fifth disk; the parity block for the second stripe might be on the fourth disk; the parity block for the third stripe might be on the third disk; and so on. The parity block for succeeding stripes typically xe2x80x9cprecessesxe2x80x9d around the disk drives in a helical pattern (although other patterns are possible). RAID 2 through RAID 4 architectures differ from RAID 5 in how they place the parity block on the disks.
A RAID 6 architecture is similar to RAID 4 and 5 in that data is striped, but is dissimilar in that it utilizes two independent and distinct parity values for the original data, referred to herein as P and Q. The P parity is commonly calculated using a bit by bit Exclusive OR function of corresponding data chunks in a stripe from all of the original data disks. This corresponds to a one equation, one unknown, sum of products calculation. On the other hand, the Q parity is calculated linearly independent of P and using a different algorithm for sum of products calculation. As a result, each parity value is calculated using an independent algorithm and each is stored on a separate disk. Consequently, a RAID 6 system can rebuild data (assuming rebuild space is available) even in the event of a failure of two separate disks in the stripe, whereas a RAID 5 system can rebuild data only in the event of no more than a single disk failure in the stripe.
Similar to RAID 5, a RAID 6 architecture distributes the two parity blocks across all of the data storage devices in the stripe. Thus, in a stripe of N+2 data storage devices, each stripe has N blocks of original data and two blocks of independent parity data. One of the blocks of parity data is stored in one of the N+2 data storage devices, and the other of the blocks of parity data is stored in another of the N+2 data storage devices. Similar to RAID 5, the parity blocks in RAID 6 are cycled across different disks from stripe-to-stripe. For example, in a RAID 6 system using five data storage devices in a given stripe, the parity blocks for the first stripe of blocks may be written to the fourth and fifth devices; the parity blocks for the second stripe of blocks may be written to the third and fourth devices; the parity blocks for the third stripe of blocks may be written to the second and third devices; etc. Typically, again, the location of the parity blocks for succeeding blocks shifts to the succeeding logical device in the stripe, although other patterns may be used.
A hierarchical data storage system permits data to be stored according to different techniques. In a hierarchical RAID system, data can be stored according to multiple RAID architectures, such as RAID 1 and RAID 5, to afford tradeoffs between the advantages and disadvantages of the redundancy techniques.
Additionally, a data storage system may permit data to be stored in multiple redundancy groups co-existing within the system. In a RAID system, each redundancy group is a set of disks in the RAID system that use the same RAID architecture (or RAID architectures for a hierarchical RAID system) to provide redundancy. By way of example, in a RAID system having a total of thirty disks, ten disks may be in a first redundancy group using one RAID architecture(s) (e.g., using RAID 1), another twelve disks may be in a second redundancy group using a second RAID architecture(s) (e.g., using RAID 1 and RAID 5), and the remaining eight disks may be in a third redundancy group using a third RAID architecture(s) (e.g., using RAID 1 and RAID 6).
U.S. Pat. No. 5,392,244 to Jacobson et al., entitled xe2x80x9cMemory Systems with Data Storage Redundancy Managementxe2x80x9d, describes a hierarchical RAID system that enables data to be migrated from one RAID type to another RAID type as data storage conditions and space demands change. This patent, which is assigned to Hewlett-Packard Company, describes a multi-level RAID architecture in which physical storage space is mapped into a RAID-level virtual storage space having mirrored and parity RAID areas (e.g., RAID 1 and RAID 5). The RAID-level virtual storage space is then mapped into an application-level virtual storage space, which presents the storage space to the user as one large contiguously addressable space. During operation, as user storage demands change at the application-level virtual space, data can be migrated between the mirrored and parity RAID areas at the RAID-level virtual space to accommodate the changes. For instance, data once stored according to mirrored redundancy may be shifted and stored using parity redundancy, or vice versa. The U.S. Pat. No. 5,392,244 patent is hereby incorporated by reference to provide additional background information.
In the event that a disk in a RAID system fails the data in the array is xe2x80x9crebuiltxe2x80x9d, a process which typically involves issuing multiple read and/or write requests to the disk array. Typically, the RAID system is also available for read and write requests from a host computer during this rebuilding process. Unfortunately, these host requests often require access to the same resources as are used by the rebuild requests, and therefore compete with the rebuild requests.
In some systems, such competition between host requests and rebuild requests are resolved by either always delaying the host requests in favor of the rebuild requests (which can result in situations where the data in the storage array is rebuilt more quickly and the performance of the system in responding to host requests is diminished even though the storage array is not close to permanently losing data) or always delaying the rebuild requests in favor of the host requests (which can result in situations where the performance of the system in responding to host requests is not diminished, but rebuilding data in the storage array can take a very long time even though the storage array is close to permanently losing data). A storage array is close to permanently losing data when, for example, failure of one more particular disk in the storage array would result in data loss.
The improvement of host I/O performance and availability of a storage array during rebuild by prioritizing I/O requests described below addresses these and other disadvantages.
Improving host I/O performance and availability of a storage array during rebuild by prioritizing I/O requests is described herein.
According to one aspect, rebuild I/O requests are given priority over host I/O requests when the storage array is close to permanently losing data (for example, failure of one more particular disk in the storage array would result in data loss). Rebuild I/O requests continue to have priority over host I/O requests until the storage array is no longer close to permanently losing data, at which point host I/O requests are given priority over rebuild I/O requests.
According to another aspect, host I/O requests and rebuild I/O requests are both input to a queue to await processing. When rebuild I/O requests are to have priority (e.g., in a xe2x80x9crebuild priorityxe2x80x9d mode), new I/O requests (whether host or rebuild) are input to the bottom of the queue and propagate to the top of the queue, where they are processed in a first-in-first-out (FIFO) manner. However, when host I/O requests are to have priority (e.g., in a xe2x80x9chost priorityxe2x80x9d modexe2x80x9d), new rebuild I/O requests are input to the bottom of the queue and new host I/O requests are inserted into the queue below any other host I/O requests but above any rebuild I/O requests. This has the effect of allowing host I/O requests to be processed before rebuild I/O requests when in the host priority mode, but allowing host I/O and rebuild I/O requests to be processed in the order they are received when in the rebuild priority mode (which results in allocation of more system resources to rebuild I/O requests than host I/O requests).
According to another aspect, allocation of one or more system resources to host I/O requests is restricted when rebuild I/O requests are to have priority over host I/O requests. In this case, I/O requests (whether host or rebuild) are processed in the order they are received so long as the system resource(s) usage by host I/O requests has not exceed a threshold amount. If the threshold amount is exceeded, then rebuild I/O requests are processed before host I/O requests.
According to another aspect, processing of I/O requests (whether host or rebuild) occurs in multiple phases. The processing of a particular request can be preempted between two phases in favor of a higher priority I/O request. By way of example, if a new rebuild I/O request is received while in a rebuild priority mode then processing of a host I/O request can be preempted, allowing the rebuild I/O request to be processed.