This invention relates generally to scaling with minimal data movement in large data storage systems having a plurality of storage blocks organized as stripes with redundancy and, more specifically, to a method for expanding a data storage array by adding new storage blocks while conserving state during the minimal data movements required to reorganize the expanded data storage system.
In a data storage subsystem storing data for a computer system, throughput and reliability are important system requirements. A Redundant Array of Inexpensive or Independent Disks (herein denominated RAID or merely “array”) system meets these requirements. Viewed from a host computer, a RAID system, having a plurality of hard disk drive devices (herein denominated HDDs or merely “disks”), operates as a single logical disk. For example, a RAID-5 system is characterized in that data and corresponding parity data are stored together to improve the reliability. That is, a common exclusive-OR function of a set of N data blocks (a N+P “stripe”) is calculated and stored as a parity data block (P). When a failure occurs in one of the disks constituting the RAID-5 system, the presence of redundant data enables the data stored in the faulty disk to be reconstructed by calculating the exclusive OR of data stored in the other disks by reconstructing the single data block in each stripe lost with the faulty disk.
In a RAID system, a series of logical block addresses (LBAs) are distributed and stored (arranged) on a plurality of HDDs in block units of a predetermined data length (herein denominated a data block). Such a rule of distribution and arrangement of data is commonly denominated a “striping rule.” The number of HDDs embraced by the RAID system is a factor for determining the striping rule. Thus, when a user changes the number of HDDs in the system, data must be redistributed and rearranged according to a new striping rule. A change in the number of HDDs occurs typically when an HDD is added. Conventionally, when an HDD is added, data are rearranged on a plurality of HDDs in the system according to the following methods.
A first method is to rearrange the RAID system on the basis of the total number of HDDs after the addition. According to this method, a back-up of all data are written by the existing RAID system (having a “source configuration”) into an auxiliary storage external to the RAID system before the new RAID system is initialized and a new striping rule (destination configuration) based on the number of HDDs inclusive of the added HDDs is determined. Then, according to this new striping rule (destination configuration), the back-up data are written to the respective HDDs in the new RAID system. The auxiliary storage backup is usually accomplished regularly during normal operation but even if no time is required to update the backup files, the rearrangement is very time-consuming, requiring hours to complete.
FIG. 1, comprising FIGS. 1A and 1B, is a diagram illustrating a reconfiguration of the data blocks in an array when adding one disk to convert a (4+P) RAID-5 system to a (5+P) RAID-5 system, according to a first method from the prior art.
Referring to FIG. 1, data streams transferred from a host computer are divided into data blocks. Individual data blocks are written to four HDDs (D1 to D4) in sequence at the block unit (Block 1 to Block 4). And in the fifth HDD (D5), the exclusive OR of these data blocks (Block 1 to Block 4) (hereinafter, referred to as parity Block P) is written. Row R1 includes four data blocks (Block 1 to Block 4) and a computed parity block (Block P), which is the parity of these four data blocks. As shown in FIG. 1A, data blocks and the related parity block are also written for the subsequent rows in the respective HDDs with the usual left-symmetric parity rotation.
When the sixth HDD D6 is added to the system, the data in Blocks 1-20 are backed up and thereafter the rearrangement is carried out in each HDD of the initialized system at the block level according to a new (5+P) striping rule. Because the number of blocks in Row R1 increases by one (Block 1′ to Block 5′), parities of data in the five blocks are newly calculated to determine a parity block (Block P′). In a similar manner, rearrangement of data blocks and calculation and arrangement of a parity block are repeatedly executed for Rows R2-R6 with the usual left-symmetric parity rotation substantially as shown in FIG. 1B.
In some data storage systems, the RAID system can be adjusted from the layout of FIG. 1A to that in FIG. 1B by moving data blocks within the array according to a “RAID extension” process known in the art. Such an approach has two striping layouts, and at least one boundary, but may be accomplished without the use of external storage. Despite this feature, completing the data movement still requires a very long time (many hours) because almost every block in the array must be moved. The data are usually backed up to protect against a system error during the “RAID extension” process but the process does not require an external data store. In the above example, (FIG. 1), all data from five disks must be read and the contents of six disks written (assuming the sixth disk starts empty) to obtain consistent parity. Essentially, the entire array must be rewritten.
This RAID extension process is favored in the art because the data storage efficiency is maximized, which many practitioners consider desirable. As used herein, the term “data storage efficiency” denotes the ratio of the total non-parity data storage capacity divided by the total data storage capacity of the data storage system, which in this example is increased from 80% in FIG. 1A to 86.67% in FIG. 1B. Without using an external backup store, the rearrangement example shown in FIG. 1 may be accomplished, for example, by the following steps:
(a) leave Blocks 1-4 in position;
(b) compute P′ from Blocks 1-5 and write P′ to D6R1;
(c) move Block 5 from D5R2 to D5R1, which is available because of the new P′ in D6R1;
(d) move Block 6 from D1R2 to D6R2, which is empty;
(e) move Block 7 from D2R2 to D1R2;
(f) move Block 8 from D3R2 to D2R2;
(g) move Block 9 from D4R2 to D3R2;
(h) compute P′ from Blocks 6-10 and write P′ to D5R2, which was earlier vacated by Block 5;
(i) move Block 10 from D5R3 to D4R2, which is available because of the new P′ in D5R2; and so forth in this manner using the empty (unallocated) space on disk 6 for temporary storage.
Although not commonly used in the art, another method is to newly construct a RAID system with added HDDs that are configured with a striping rule independently from the existing RAID before modification. FIG. 2 is a diagram illustrating a reconfiguration of the data blocks in an array created by adding three disks to a 5-disk (4+P) RAID-5 system to create an 8-disk RAID operating with two independent (4+P) and (2+P) striping rules. Independently of the existing RAID source configuration (FIG. 2A) including five HDDs (D1-D5), a second (destination) RAID (FIG. 2B) is formed by adding three new HDDs (D6-D8) made accessible as a separate logical unit according to a different (2+P) striping rule.
Such conventional methods have the following problems. In the first method of FIG. 1, the host computer cannot access this system when data has been erased and the system is initialized. Because RAID systems are expected to be always available, the down-time required for reconfiguration and initialization of the RAID system is a serious problem. Also, the requisite auxiliary storage of large capacity for the temporary back-up of data adds to the cost of the data storage system.
In the second method (FIG. 2), as the RAID is divided into two or more independently configured systems, system performance is reduced compared to a single RAID system having the same number of HDDs, for two reasons. First, the distribution of data reduces the HDD accesses per HDD for a larger the number of HDDs constituting the RAID. Thus, in the example of FIG. 2, the data performance is higher for an 8-HDD RAID system collectively than for two 5-HDD and 3-HDD RAID systems. Secondly, the complexity of controlling a plurality of separate RAID systems lowers overall data storage system performance. Finally, the “data storage efficiency” is decreased from 80% in FIG. 2A to 75% in FIG. 1B, which is generally undesirable in the storage arts.
Accordingly, there is a well-known and universal need to improve the reliability and speed of procedures for increasing the storage capacity of existing data storage systems as the needs increase. Standard RAID arrays, such as RAID-5 and RAID-6 make this a tedious prospect. The user must either add a new array or perform a lengthy data element reconfiguration process. For example, extending a RAID-5 system from seven disks to eight disks requires reading the entire data contents of the array (six disks worth excluding redundant parity data) before re-writing all of the data and parity onto the eight disks. During this expansion (scaling) process, the existing data may be available from an in-situ copy but the new data storage system capacity is unavailable to the host computer until the process is completed.
The art is replete with proposals for resolving some of these problems. For example, U.S. Pat. No. 6,304,941, Lyons et al. disclose a method and apparatus for reducing processor operations when adding a new drive to a raid-6 drive group. Their method reduces the number of transactions that take place between the RAID subsystem controller and the RAID device during the installation of a new drive by transferring the installation process from the controller to the new drive. Their system is also directed towards reducing the time required to install a new drive to a RAID system by allowing the multiple drive processor to accomplish the installation. Their method reduces the time to install a new drive to a RAID device by following the individual processes to accomplish the installation. The installation is accomplished in parallel with each drive managing the rearrangement of the data segments on every other drive. This frees the controller from managing the rearrangement of the data segments.
As another example, in U.S. Pat. No. 6,347,359, Smith et al. disclose a method for reconfiguration of RAID data storage systems. Their system optimizes the reconfiguration by determining if a combination of changes to system parameters and possible rebuilding operations can replace the migration process, and, if this is possible, the reconfiguration process is modified to eliminate data migration. The array controller of the system pursues the various level of optimization by changing parameters stored in the reserved storage areas without the need for data migration.
In U.S. Pat. No. 5,991,804, Bolosky et al. disclose a method for reconfiguring the file server in an efficient manner following a change in system capacity. The controller performs the reconfiguration in multiple phases, and the data servers perform the last two phases in parallel. The order of the last two phases depends upon whether one or more storage disks are being added or removed from the system. Their method pursues a second layout strategy to optimize the starting locations of the data files so that each data file starts on the disk and results in moving the least number of data blocks during the second phase of the re-striping process.
In U.S. Pat. No. 6,901,479, Tomita discloses a method for expanding the storage capacity dynamically by adding a disk drive. The disk array includes a plurality of disk drives. The disk array has a redundant disk configuration so that, even if any one of the disk drives fails, the data in the troubled disk drive can be regenerated (or recovered). The controller writes simultaneously in the disk array of data on the write buffer and generates one stripe segment of parity data. As the disk drive has been added to the disk array, the data is written simultaneously into an empty stripe of the disk array after the disk drive is added, according to a new striping rule that corresponds to the disk array after the expansion of the storage capacity. When writing the data simultaneously into the disk array, the controller updates the restructured address translation table.
As a further example, in U.S. Pat. No. 6,035,373, Iwata discloses a method for rearranging data in a disk array system when a new disk storage unit is added to the array. The number of data storage units may be increased without requiring the back-up of stored data. In addition, after the completion of data rearrangement, a system including the newly added data storage units can be composed. Thus, the load per data storage unit can be reduced and the performance of data accesses can be improved.
Thus, there is still a clearly-felt need to increase the storage capacity of disk systems during operation as the user's storage needs increase without the downtime needed for the tedious data transfers required in the art for standard RAID systems. There is also a clearly-felt need for a method that minimizes the required data movement when adding a disk to an existing RAID system. Further, there is a clearly-felt need for a method that instantly makes available the expansion capacity without a delay for completion of the necessary data movement and without risk of data loss upon failure of any system components.