1. Field of the Invention
The present invention relates, in general, to redundant data storage, and, more particularly, to software, systems, controllers and methods for reorganizing data in a RAID (redundant array of independent disks) system to improve performance during rebuild, capacity expansion, and migration operations.
2. Relevant Background
Recent years have seen a proliferation of computers and storage subsystems. Early computer systems relied heavily on direct-attached storage (DAS) consisting of one or more disk drives coupled to a system bus. More recently, network-attached storage (NAS) and storage area network (SAN) technology are used to provide storage with greater capacity, higher reliability, and higher availability.
RAID (Redundant Array of Independent/Inexpensive Disks) is an organization of data on a plurality of disks to achieve varying levels of availability and performance. The plurality of disks used to implement any particular RAID volume may be directly attached to a single computer, or implemented in network-connected drives that are accessible by one or more computers. In general, RAID uses combinations of multiple drives to obtain performance, capacity and reliability that exceeds that of a single large drive. The array of drives appear to the host computer as a single logical drive. Several levels of RAID architectures have been defined where each level provides a different level of fault tolerance, performance and data availability.
RAID storage systems can be implemented in hardware or software. In a hardware implementation the RAID algorithms are built into a controller that connects to the host computer's I/O bus. In a software implementation the RAID algorithms are incorporated into software that runs on the main processor of the host computer in conjunction with the operating system. In network systems, a controller is implemented within a storage network, or as a gateway to a storage network, and the RAID algorithms operate in the hardware/software of the network RAID controller so as to relieve the host computer of responsibility for the RAID implementation.
An important concept used by many RAID architectures is “striping” which spreads data across the disks in the array. A typical storage access request identifies a starting logical block address (LBA) and a count of the number of blocks involved in the access request. A block is the smallest unit of data that can be read or written to a disk. The data access requests are handled in segments referred to as “strips.” A strip represents an quantity of data that can be accessed on a single disk. Usually a strip comprises a group of blocks, although a strip may be smaller than a single block in some systems. In other words, a strip is the unit of data interleaving for a RAID array. For example, in a four disk RAID-5 array the first strip is placed on the first disk, the second strip is placed on the second disk, the third strip is placed on the third disk, the fourth strip is placed on the fourth disk, the fifth strip is placed on the first disk and so on. This spreading of data increases performance because the multiple drives can work concurrently to service data access requests during heavy load operations.
RAID level 3 (RAID 3), RAID level 4 (RAID-4) and RAID level 5 (RAID-5) are organizations of data for an array of n+1 disks that provide enhanced performance through the use of striping and enhanced data availability through the use of parity. Each disk in the RAID array is referred to as a “member” of the array. A parity block/strip is associated with every n data blocks/strips. The data and parity information are distributed over the n+1 disks so that if a single disk fails, all of the data can be recovered. A “stripe” is the collection of parity blocks and all data blocks that contribute to it. RAID-3 and RAID-4 are systems in which data blocks are organized into strips which are interleaved among the disks and protected by parity and all of the parity, is written on a single disk. RAID-5 is a level of organization of data for a RAID array where data blocks are organized in strips which are interleaved among the disks and protected by parity and the parity information is distributed over all of the disks in the array. In general, RAID-5 provides a suitable mix of performance and protection for most applications.
The contents of each bit of the parity block is the Exclusive-OR (“XOR”) of the corresponding bit in each of the n corresponding data blocks. When data blocks are written, the parity information is computed and stored in the corresponding parity block. Under normal conditions, subsequent reads can access the data blocks without any parity operations. Subsequent write operations that modify the stored data must recomputed the parity information and write to each disk that holds a block that has changed as well as the parity block.
At various times in the life of a RAID system the data and parity blocks must be moved and/or reconstructed. For example, in the event of a disk failure, the data blocks and parity blocks that were stored on the failed drive are rebuilt onto a replacement drive using the data and parity blocks on the surviving drives in a process that is referred to as re-striping the array across all drives. This is done, for example, by calculating the XOR of the contents of the parity and/or data blocks remaining on the surviving disks that contributed to the block on the failed drive. “Expansion” generally refers to adding a new member to an existing array, which requires re-striping of the data & parity. It is possible to create an entirely new array (i.e., a “RAID set”) from any added members as controllers generally support multiple RAID sets. “Migration” refers to re-striping to affect a change in the fault-tolerance level or strip size, without adding any new disk members to the array.
When capacity is added to a RAID system by adding drives to an existing array, the data blocks and parity blocks must be re-striped across all drives in the array. Optionally, RAID migration to affect a change in fault-tolerance level may occur in parallel with the capacity expansion re-striping process. Migration may also be performed independently of capacity expansion to improve spindle usage in the array. Additional background information on capacity expansion and migration (collectively termed “reconfiguration”) may be found in U.S. Pat. No. 6,058,489, entitled Online Disk Array Reconfiguration which is assigned to the assignee of the present invention, invented by the inventors of the present application, and which is incorporated herein by reference.
These operations are typically time consuming. While it is possible to perform these operations while the RAID system remains operational, performance and/or data protection during the rebuild, expansion, and migration processes is often impacted. In instances where the storage system must remain in a fault-tolerant state, these operations often require that the storage system be taken off line until the operations are completed. In any case, it is desirable to reduce the time required to perform rebuild, expansion, and migration operations.
Conventional systems perform these operations on each block in a volume, irrespective of whether that block contains any data and/or parity information. While this was acceptable when physical drives were relatively small, the operations can take a significant amount of time with modem hard drives that implement gigabytes of capacity. However, RAID controllers have not, until now, been able to distinguish between used and unused portions of the storage capacity. Hence, a need exists for systems, methods, controllers, and software that intelligently manage rebuild, expansion, and migration operations to reduce the time required for these operations on large physical drives.