In a data storage subsystem storing data for a computer system, throughput and reliability are important system requirements. A RAID (Redundant Array of Inexpensive or Independent Disks) system meets these requirements. Viewed from a host computer, a RAID system, having a plurality of hard disk drive devices (hereinafter, referred to as HDD), operates seemingly as if it were one HDD. A RAID system is characterized in that data and corresponding parity data are stored together to improve the reliability. That is, a common exclusive OR function of a set of data block is calculated and this calculated result is stored as parity data. When a failure occurs in one of the disks constituting the system, the presence of redundant data enables the data stored in the faulty disk to be reconstructed by calculating the exclusive OR of data stored in the other disks.
In a RAID system, a series of data transfers from a host computer are distributed and stored (arranged) on a plurality of HDDs in block units of a predetermined data length (hereinafter, referred to as a data block). Such a rule of distribution and arrangement of data is referred to as a striping rule. The number of HDDs comprising the system is a factor for determining the striping rule. Thus, when a user changed the number of HDDs comprising the system, data must be redistributed and rearranged according to a new striping rule. A change in the number of HDDs occurs typically when an HDD is added. Previously, when an HDD is added, data is rearranged on a plurality of HDDs in the system according to the following methods.
A first method is to rearrange the RAID system on the basis of the total number of HDDs after the addition. According to this method, first, a back-up of data is stored by the RAID system in an auxiliary storage (not the HDDs constituting the RAID system). After preparing the back-up, the RAID system is initialized and a striping rule based on the number of HDDs inclusive of the added HDDs is determined. Then, according to this striping rule, the back-up data is rearranged in the respective HDDs.
FIG. 1 is a diagram showing a change in the data block arranged in a two-dimensional (2-D) array when one HDD is added to a RAID system comprising 5 HDDs.
Referring to FIG. 1, data streams transferred from the host computer are divided into data blocks. Individual data blocks are written to four HDDs (drive 0 to drive 3) in sequence at the block unit (Block 0 to Block 3). And in the fifth HDD (drive 4), the exclusive OR of these data blocks (Block 0 to Block 3) (hereinafter, referred to as parity block) (Block P) is written. Row 0 comprises four data blocks (Block 0 to Block 3) and a parity block (Block P) which is the parity of these data blocks. As shown in FIG. 1(a), data blocks and the parity block of these data blocks are also written for the subsequent rows in the respective HDDs.
When the sixth HDD (drive 5) is added to the system, the data is backed up and thereafter the rearrangement is carried out in each HDD of the initialized system at the block unit. Since the number of blocks in Row 0 increases by one (Block 0' to Block 4'), parities of data in the five blocks are newly calculated to determine a parity block (Block P'). In a similar manner, rearrangement of data blocks and calculation and arrangement of a parity block are repeatedly executed for Row 1 and the subsequent rows (FIG. 1(b)).
The second method is to newly construct a RAID system with added HDDs, separate from the RAID before the modification.
FIG. 2 is a diagram showing changes in the data blocks arranged in a 2-D array using the second method when three HDDs are added to a RAID system comprising 5 HDDs. Independently of the RAID (FIG. 2(a)) comprising 5 HDDs (drive 0 to drive 4), a RAID is constructed with three newly added HDDs (drive 5 to drive 7) (FIG. 2(b)) and this additional part is made accessible as a separate logical unit according to a different striping rule.
Such conventional methods have the following problems. In the first method, to newly reconstruct the RAID including the added HDDs, the host computer cannot access this system when data has been erased and the system is initialized. Since RAID systems are required to be always available, the initialization of the RAID system causes a serious problem. In addition, because an auxiliary storage of large capacity is necessary for the back-up of data, the cost becomes high.
In the second method, since the RAID is divided into two or more systems, the performance of the system is lowered compared to a single RAID system having the same number of HDDs for two reasons. First, the larger the number of HDDs constituting the RAID, the smaller the amount of HDD accesses per HDD due to the distribution of data. In the example of FIG. 2, the data performance is higher for a RAID system comprising 8 HDDs collectively than for two RAID systems comprising respectively 5 HDDs and 3 HDDs. Another problem with this second method is that it is complicated to control a plurality of RAID system which lowers performance.
Thus, it is a first object of the present invention to provide a method enabling a new storage device to be added to a system without having to erase the data stored in the system.
It is a second object of the present invention to prevent the lowering of system performance even when the constitution of the system is modified by the addition of a new storage devices to the system.