1. Field of the Invention
The present invention relates to a storage system having a large-capacity storage apparatus which is comprised of a combination of storage devices such as a plurality of magnetic disk devices, to a control method thereof, and to a program; and particularly relates to a storage system for enhancing reliability against failure by causing data, which is to be recorded in a storage apparatus, to be redundant and encoded, a control method thereof, and a program. 2. Description of the Related Arts
Conventionally, in order to protect data in a medium of a magnetic disk device from failure, unexpected accidents, etc., RAID (Redundant Arrays of Independent Disks) is provided as a generally used technique. Although the levels of RAID include RAID level 0 to RAID level 6 in accordance with its use, typical ones used as techniques for enhancing safety of data are RAID 1, RAID 5, and RAID 6. RAID 1 is generally realized by use of two disks. This is the simplest method in which failure tolerance of the disks is enhanced by writing the same data to two disks, and called mirroring.
FIG. 1 shows a storage system to which RAID 1 is applied, wherein, for example, five magnetic disk devices 104-1 to 104-5 are disposed under a controller 100 as storage devices, and every one of the magnetic disk devices 104-1 to 1042-5 has respective replicated data 102-1 to 102-5 with respect to original data 102 which is composed of data blocks A to H. RAID 5 is a method in which data is recorded in a manner that it is distributed to a plurality of disks, wherein, upon writing, a redundant code called parity obtained by addition of data is generated and written at the same time. Accordingly, even if any one of the disks fails, the original complete data can be restored from the data and parity information of the disks other than that.
FIG. 2 shows a storage system to which RAID 5 is applied. For example, the five magnetic disk devices 104-1 to 104-5 are disposed as storage devices under the controller 100.
With respect to the data blocks A to H of the original data 102, a parity PABCD is calculated from exclusive OR of the data blocks A to D, and they are stored in the magnetic disk devices 104-1 to 104-5; and, subsequently, a parity PEFGH is calculated from exclusive OR of the data blocks E to H, and they are stored in the magnetic disk devices 104-1 to 104-5. The parities PABCD and PEFGH are stored in different devices such as in the magnetic disk device 104-5 and 104-4, respectively. In RAID 5, the capacity of disks required for recording the parities is a capacity corresponding to one disk regardless of the number of the disks; therefore, utilization efficiency is high compared with RAID 1. RAID 6 is an extended version of RAID 5, wherein two pieces of parity are generated which is one in RAID 5. Accordingly, even if two magnetic disk devices fail at the same time, the original data can be restored in this method. Although there are several methods for obtaining the second parity, the Read Solomon code is generally employed.
FIG. 3 shows a storage system to which RAID 6 is applied, wherein, for example, the data blocks A to C are stored in the magnetic disk devices 104-1 to 104-3 by the controller 100, a parity PABC is calculated from exclusive OR of the data blocks A to C and stored in the magnetic disk device 104-4. Furthermore, a second parity Q1 is calculated from the data of the data blocks A to BC and the parity PABC by use of Read Solomon code and stored in the magnetic disk device 104-5. As described above, since one parity is increased in RAID 6 compared with RAID 5, original data can be recovered even when two units in a storage system fail at the same time. However, since two parities are generated, utilization efficiency is lower than RAID 5 by an amount corresponding to one disk, and there is also a problem that the calculation amount is large since Read Solomon code is used for obtaining the second parity. Moreover, as a conventional file control method of a storage system, there is a method for controlling a file by changing the RAID level in accordance with the file size (JP 3505093 1). In this method, in the case of a file which is less than one block, it is stored by the redundancy method of the RAID level 1; and, if it is two blocks or more, it is stored by the redundancy method of the RAID level 5.
Furthermore, as a method for correcting a storage system data rows, there is a method in which, when two data DASDs are defective, data thereof is generated again as functions of a pair of syndromes constituting two pool equations of unknown values (JP 05-197579). In this method, reproduction of data is facilitated and load of writing process of the entire rows is balanced by storing a matrix of powers of polynomial expressions of code primitive and performing pipeline processing.
However, such storage systems for ensuring redundancy of data by applying conventional RAID have the following problems. First of all, copy of the original data is generated in each magnetic disk device in mirroring of RAID 1; therefore, there is a problem that, although failure tolerance is high, the utilization efficiency of the disks is extremely bad. Moreover, although the utilization efficiency of the disks is high in RAID 5 compared with RAID 1, a restorable case is merely when one disk fails, and restoration is impossible when two or more units thereof fail at the same time. Furthermore, RAID 6 has problems that the utilization efficiency is lower than RAID 5 by an amount corresponding to one disk since two parities are generated. Moreover, values used as indications of failure rates of magnetic disks include mean failure interval MTBF (Mean Time Between Failures, wherein unit is time). The mean failure interval MTBF represents mean time from recovery of a product until next failure is generated. For example, when the mean failure interval MTBF is 400,000 hours, one disk is broken after 400,000 hours. When 10 magnetic disk devices are simultaneously used in RAID or the like, one of them fails in 40,000 hours. Since all the magnetic disk devices are treated equivalently in RAID, magnetic disk devices having the same performance have to be used as much as possible; however, the mean failure interval MTBF of the magnetic disk devices is reduced more and more along with time, and, if a particular magnetic disk device is replaced, the difference with the replaced magnetic disk device increases. When such a situation is caused, there is a problem that reliability of RAID is largely reduced.