1. Field of the Invention
The present invention relates to a disk array system, and more particularly to a disk array system for storing data in a plurality of magnetic disk devices (hard disk drive (HDD)) using a redundancy configuration.
2. Description of the Related Art
In the past, a redundant array of inexpensive disks (RAID) Level 0, which stores data on a single HDD, was the norm. Thereafter, so as to further enhance reliability, RAID Level 1 disk array systems, which store the same data on a plurality of HDD, and RAID Levels 3, 4 and 5 disk array systems, which store data in a distributed fashion on a plurality of HDD, were utilized.
With a RAID Level 0 magnetic disk device, when data stored on an HDD was lost, it was no longer possible to use that data.
With RAID Levels 1, 3, 4 and 5 disk array systems, data is made redundant, so that even if data stored on one of the built-in HDD units is lost, when the disk array system is viewed as a whole, that data can be restored. For this reason, workstations, network servers and other equipment that requires large-capacity external storage systems have come to make use of disk array systems that utilize RAID Level 1, 3, 4 or 5 arrays.
The operation of a dual magnetic disk device, called a RAID Level 1, is explained using FIG. 9. When a disk array controller 53 receives a write request from a host, the write-requested data 60 is written to both HDD 54.sub.1 and 54.sub.2. When, as a result of this write operation, it is possible to read the data from both HDD, by comparing HDD 54.sub.1 data against HDD 54.sub.2 data, a highly accurate data read is possible.
And even when it is not possible to read data from one of the HDD, the data can still be obtained be reading it from the other HDD. For example, when it is not possible to read data from HDD 54.sub.1, the data can be obtained by reading it from HDD 54.sub.2 alone.
FIG. 10 is a schematic depicting the configuration of a disk array system 51 when a redundancy configuration, called RAID Level 4, is used. When a disk array controller 53 receives a write request from a host, the write-requested data 60 is divided into sector units and written in a distributed fashion to storage regions 58.sub.1 -58.sub.4 of HDD 54.sub.1 -54.sub.4.
The disk array controller 53 does not simply distribute the data at this time. That is, it performs an Exclusive OR (XOR) operation on data D.sub.1, D.sub.2, D.sub.3, D.sub.4 stored in corresponding storage regions 58.sub.1 -58.sub.4, and writes the result of this operation, parity P, to storage region 58.sub.5 of HDD 54.sub.5. This XOR operation provides redundancy to the data. Therefore, all of the data is stored in a format that is capable of being reconstructed on the basis of other data parity.
For example, the parity P XOR operation result for data D.sub.4 is the same as those for data D.sub.1, D.sub.2, D.sub.3. Consequently, if it is not possible to read storage region 58.sub.4, when the data in this HDD storage region and the parity are read out and subjected to an XOR operation, data D.sub.4 can be obtained without reading storage region 58.sub.4.
Further, with a conventional disk array system, data specifying a failed HDD is stored in volatile memory. Consequently, when the system goes down as a result of a power outage or something, data specifying the failed HDD is lost.
By comparison, with a disk array system, various systems are proposed, whereby data specifying a failed HDD is stored in nonvolatile memory.
For example, in Japanese Patent Publication No. A7-56694, a control method for a system, which uses nonvolatile memory to store the status of a magnetic disk device, is proposed.
FIG. 11 depicts the configuration of this conventional disk array system. The disk array system 51 comprises an interface 52, a magnetic disk controller (disk array controller) 53, perhaps 5 magnetic disk devices (HDD) 54.sub.1 -54.sub.5, nonvolatile memory 55.sub.1 -55.sub.5 corresponding to each HDD, and a clock 56.
The interface 52 inputs data access requests from a host 61 (either read or write requests) to the disk array controller 53.
The disk array controller 53 comprises means for performing a data read or write operation by controlling the HDD 54 in accordance with the contents of a request output from the host, and means for determining the status of each HDD based on data in nonvolatile memory 55, and data stored in a storage management data storage region 57 created on the respective disk media. The clock 56 is used to rewrite date and time data in nonvolatile memory 55 storage management data when a failure occurs.
An overview of this prior example is provided by referring to FIGS 12 and 13. This prior example uses a variable "i", which specifies in sequence a plurality of HDD, a parameter "N.sub.DISK ", which specifies a failed HDD when a failed HDD exists, and a parameter "N.sub.ERR ", in which the count value of the number of failed HDD is stored. Also, storage management data stored in each nonvolatile memory 55 depicted in FIG. 11 is stored in array A (i), and storage management data stored in the storage management data storage region 57 is stored in array B (i).
Since this prior example employs a redundancy configuration that makes data recovery possible even if an entire disk's worth of data is lost, when the value of "N.sub.ERR ", in which the count value of the number of failed HDD is stored, is 2 or more, it treats the entire disk array system as abnormal. And, based on the parameter which specifies a failed HDD, this prior example determines whether or not the failed HDD was replaced with a new HDD, and when it determines that the device is a new HDD, it automatically performs recovery processing.
Further, with this prior example, at initialization, the disk array controller 53 stores storage management data containing date/time information in the storage management data storage region 57 of the HDD, and in the nonvolatile memory 55 provided with the pertinent HDD. Also, when an HDD fails, the disk array controller stores the date and time the failure occurred in the nonvolatile memory provided with the failed HDD. Therefore, it is possible to check whether or not the pertinent HDD is the failed disk by comparing the contents of nonvolatile memory 55 with the contents of the storage management data storage region 57.
Specifically, first, as shown in FIG. 12, "0" is set in array variables A(i), B (i), "1" is set in counter i, "0" is set in failed HDD counter N.sub.ERR, and "0" is set in failed HDD identification parameter N.sub.DISK (S201).
Then, the i.sup.th nonvolatile memory is tested, and when nonvolatile memory is not normal (N), a determination is made as to the usability of that HDD, N.sub.ERR is incremented to "1", i is set in N.sub.DISK (S207) and processing proceeds to S208.
When nonvolatile memory is normal (S202: Y), the contents of that nonvolatile memory are written to array variable A(i). Next, the i.sup.th HDD storage management data storage region is tested, and when the storage management data storage region is not normal (S204: N), processing proceeds to S207. When the storage management data storage region is normal (S204: Y), the contents of that storage management data storage region are set in B (i) (S205).
When array variables A (i) and B (i) do not match (S206: N), since the i.sup.th HDD is not a normal HDD, N.sub.ERR is incremented to "1", and i is set in N.sub.DISK (S207). Next, "1" is added to i (S208), and when i is not greater than the number of HDD (S209: Y), processing returns to S202 and the next nonvolatile memory, storage management data storage region is tested. This type operation is repeated until i becomes greater than the total number of HDD.
Next, the number of failed HDD N.sub.ERR is determined (S301). When N.sub.ERR is "0", all HDD are normal and the startup operation ends. When N.sub.ERR is "2" or larger, an error message is output (S302), and the restart operation ends. When N.sub.ERR is "1", A (N.sub.DISK) is compared to "0", and when A (N.sub.DISK) is "0" (S303: Y), nonvolatile memory is determined to be abnormal at step S202. Since the contents of that nonvolatile memory are not stored in array A (N.sub.DISK), the N.sub.DISK.sup.th nonvolatile memory is abnormal, and the N.sub.DISK.sup.th HDD is therefore deemed unusable (S306).
When the A (N.sub.DISK) is not "0" (S303: N), there are times when the N.sub.DISK .sup.th HDD could be usable. Therefore, that HDD is tested (S204).
Then, when that HDD is determined to be abnormal (S305: N), that HDD is deemed unusable (S306), and when it is determined to be normal (S305: Y), recovery work is performed by writing data reconstructed from data in other HDD to that HDD (S307). Once recovery work is complete, A (N.sub.DISK) is written to that HDD's storage management data storage region, and the restart operation ends.
However, this prior example is inadequate in that when replacing a failed HDD, if an HDD that already has data stored therein is mistakenly connected as the new HDD, recovery data is written to the HDD with data already stored therein without an error of any kind being detected, thus resulting in the original data being lost.
That is, when there are a plurality of disk array systems configured as shown in FIG. 11, and a plurality of failed HDD, the procedures depicted in FIGS. 12 and 13 will give rise to a failure at restart following the replacement of the failed HDD.
For example, assume there are 2 magnetic disk devices of the type shown in FIG. 11, and 2 HDD to be replaced, and assume that HDD 54a.sub.1 in disk array system 51a, and HDD 54b.sub.2 in disk array system 51b, as shown in FIG. 14, are the failed HDD.
Because disk array system 51a and disk array system 51b are each operating independently, the data being written to HDD 54a.sub.1 -54a.sub.5 and 54b.sub.1 -54b.sub.5 are different.
When replacing the failed HDD 54a.sub.1 and 54b.sub.2 with the new HDD 54c.sub.1 and 54c.sub.2, instead of disk array system 51a's failed HDD 54a.sub.1 being removed, HDD 54a.sub.2, which is operating normally, is removed, and replaced with the new HDD 54c.sub.1. Then, disk array system 51b's failed HDD 54b.sub.2 is removed, and previously-removed HDD 54a.sub.2, which operated normally in disk array system 51a, is mounted. After both magnetic disk devices have been replaced, they are simultaneously restarted.
That is, a magnetic disk device that is not failing is changed for a new replacement magnetic disk device, and a failing magnetic disk device is replaced with one that is mistaken for a replacement magnetic disk device.
In this case, since failed HDD 54a.sub.1 and new HDD 54c.sub.1 (in the place where HDD 54a.sub.2 was located) become failed HDD, disk array system 51a is in an unusable state. On the other hand, since disk array system 51b's failed HDD 54b.sub.2 is replaced by normally-operating HDD 54a.sub.2, recovery commences.
Since the data in disk array system 51a's HDD 54a.sub.2 is lost when the recovery operation is performed for disk array system 51b, data from 2 HDD, 54a.sub.1 and 54a.sub.2, are lost, and recovery work cannot be carried out for disk array system 51a.
That is, it was a problem in that when a normal magnetic disk device was removed from one disk array system, connected to another disk array system, and recovery work performed, failure recovery could not be performed for one of the disk array systems. This kind of problem will not occur if the user of the magnetic disk device does not make a mistake when replacing the HDD. However, it is desirable for the magnetic disk device to make data redundant so that, in addition to enhancing its reliability as a storage device, it can also cope with human errors such as this.
With the prior example, the possibility that data might be lost during recovery processing is not taken into consideration like this.
Further, treating nonvolatile memory 55.sub.1 -55.sub.4 as a single nonvolatile memory, and partitioning the regions in which storage management data is stored, enables nonvolatile memory to correspond to each HDD. But when nonvolatile memory is treated as a single memory, the nonvolatile memory test of the system control method shown in FIGS. 12 and 13 takes the form of the tests conducted on the respective storage management data storage regions of nonvolatile memory, resulting in cases where failures of nonvolatile memory itself are not detected.
For example, when a failure occurs in an HDD, and the storage management data storage region of nonvolatile memory is rewritten, nonvolatile memory, which has operated normally up until now, could generate a write failure. This could either make it impossible to rewrite data, or result in the same data that existed prior to rewrite being rewritten. Thereafter, if the device is restarted without detecting the failure, the test of the storage management data storage region where the write failure occurred in nonvolatile memory ends normally, and it appears that the device is fault free. And even if the test of the storage management data storage region where the write failure occurred generates an error, the other storage management data storage regions will appear normal.
When a nonvolatile memory failure occurs in a magnetic disk device employing a single nonvolatile memory, this creates situations wherein it is impossible to detect HDD failures, and the failing regions cannot be used, but other regions can be used. This makes it impossible to learn the correct status of a magnetic disk device.
This problem is felt to be caused by various failures that occur in the circuitry. For that reason, it is desirable that the device possess mechanisms for detecting failures when problems like this occur.