The present invention relates to a method of updating error correcting codes and a disk array device which is suitable to the updating method, the disk array device comprising a plurality of disks for holding data and a disk for holding error correcting codes for the data held in the plurality of disks.
In a present day computer system, the data needed by a higher order device, such as a CPU, etc., is stored in a secondary storage, and the CPU, as occasion demands, performs read or write operations for the secondary storage. In general, a nonvolatile storage medium is used for the secondary storage, and as a representative storage, a disk device (hereinafter referred to as a disk drive) or a disk in which a magnetic material or a photo-electromagnetic material is used can be cited.
In recent years, with the advancement of computerization, a secondary storage of higher performance has been required. As a solution, there has been proposed a disk array constituted with a plurality of drives of comparatively small capacity. For example, referenced is made to "A Case for Redundant Arrays of Inexpensive Disks (RAID)" by D. Patterson, G. Gibson, and R. H. Kartz read in ACM SIGMOD Conference, Chicago, Ill., (June, 1988) pp 109 to 116.
In a disk array which is composed of a large number of drives, due to the large number of parts, the probability of occurrence of a failure becomes high. Therefore, for the purpose of increasing the reliability, the use of an error correcting code has been proposed. In other words, an error correcting code is generated from a group of data stored in a plurality of data disks, and the error correcting code is written to a different disk than the plurality of data disks. When a failure occurs in any of the data disks, the data in the failed disk is reconstructed using the data stored in the rest of the normal disks and the error correcting code. A group of data to be used for generating the error correcting code is called an error correcting data group. Parity is mostly used as the scheme for an error correcting code. In the following, parity is used for an error correcting code, except for a case under special circumstances; however, it will be apparent that the present invention is effective in a case where an error correcting code other than one based on parity is used. When parity is used for an error correcting code, the error correcting code data group can be also referred to as a parity group.
The above-mentioned document reports the results of a study concerning the performance and the reliability of a disk array (level 3) in which write data is divided and written to a plurality of drives in parallel, and of a disk array (level 4 and 5) in which data is distributed and handled independently.
In a present day large scale general purpose computer system or the like, in a secondary storage, constituted with disk drives, the addresses of individual units of data which are transferred from a CPU are fixed to predetermined addresses, and when the CPU performs reading or writing of the data, the CPU accesses the fixed addresses. The same thing can be said about a disk array.
In the case of a disk array (level 3) in which data is divided and processed in parallel, the fixing of addresses exerts no influence upon the disk array; however, in the case of a disk array (level 4 and 5) in which data is distributed and handled independently, when the addresses are fixed, a data writing process is followed by a large overhead. About the overhead, an explanation has been given in Japanese Patent Application No. Hei 4-230512; in the following also, the overhead will be explained briefly in the case of a disk array (level 4) in which data is distributed and handled independently.
In FIG. 15A, each address (i,j) is an address for a unit of data which can be processed in read/write operation of one access time.
Parity is constituted by a combination of data composed of 4 groups of data in each address (2,2) in the drives from No. 1 to No. 4, and the parity is stored in a corresponding address (2,2) in the drive No. 5 for storing parity. For example, when the data in the address (2,2) in drive No. 3 is to be updated, at first, the old data before the update in the address (2,2) in drive No. 3 and the old parity in the address (2,2) in the parity drive No. 5 are read (1).
An exclusive-OR operation on the read old data, the old parity, and the updated new data is carried out to generate a new parity (2). After the generation of the new parity is completed, the updated new data is stored in the address (2,2) in the drive No. 3, and the new parity is stored in the address (2,2) in the drive No. 5 (3).
In the case of a disk array of level 5, in order to read out the old data and the old parity from the drive on which data is stored and from the drive on which the parities are stored, disk rotation is delayed by 1/2 turn on the average, and from the disk the old data and the old parity are read out to generate a new parity.
As shown in FIG. 15B, one turn is needed to write the newly generated parity at the address (2,2) in the drive No. 5. A latency time also is needed to write the new data at the address (2,2) in the drive No. 3. In conclusion, for the rewriting of data, at a minimum, a latency time of 1.5 turns is needed. In the case of the RAID 4, since a plurality of parities for the data in a plurality of parity groups are stored on the same disk, a latency time of one turn needed when a new parity is written causes a degrading of the performance in writing. Even if the write time of new data increases, the data access for the data stored on other disks can be performed independently, in principle, so that the influence of the overhead on the write time of data is smaller in comparison with the overhead involving an update of parity.
In order to reduce the overhead during write time as described above, a dynamic address translation method may be employed, as disclosed in PCT International Application laid open under WO 91/20076, applied by Storage Technology Corporation (hereinafter referred to as STK).
In Japanese Patent Application No. Hei 4-230512, applied for by IBM, there is disclosed a method for reducing the write overhead by writing data at an address other than the address at which the write data is to be written.
On the other hand, in recent years, a flash memory has been suggested as a replacement for the magnetic disk. Since a flash memory is a nonvolatile memory, the reading or writing of data in the flash memory can be performed faster in comparison with that in a magnetic disk. In the case of a flash memory, however, when data is to be written, other data existing at the receiving address has to be erased. In the case of a representative flash memory, the write time or the read time is in the order of 100 ns, similar to the case of the RAID, but it takes about 10 ms for an erase time. Also, there is a limit to the number of times writing may be carried out, and the limit is said to be about one million times, which is regarded as a problem in practical use.
In order to solve the above-mentioned problem concerning the limit in the number of times writing is possible in the case of a flash memory, a method in which address translation is performed during the write time so that the number of writing times to flash memories can be averaged with the use of a mapping table is disclosed by IBM in Japanese Patent Application No. Hei 5-27924.