The present invention relates to a disk array device for executing data I/O processing by concurrently accessing a plurality of disk devices, and more specifically, a disk array device for maintaining consistency of data by executing, when write processing is interrupted due to, for instance, power failure, recovery processing of the write processing using the data stored therein.
A disk device having nonvolatile memory, a large capacity, a capability for high speed data transfer, and other features such as a magnetic disk or an optical disk device has been widely used as an external storage device for a computer system. Demands for a disk device include high speed data transfer, high reliability, a large capacity, and a low price. A disk array device satisfies the requirements described above. The disk array device comprises a plurality of compact disk devices for distributing and recording data therein, and for enabling concurrent access to the data.
With the disk array device, by concurrently executing data transfer to a plurality of disk devices, data transfer can be executed at a higher rate than the data transfer rate of a single disk device. Further, by recording, in addition to data, redundant information such as parity data, it becomes possible to detect and correct a data error caused by, for instance, a failure of a disk device. Also, a reliability as high as that obtained by duplicating contents of a disk device may be achieved with a lower cost.
It is generally recognized that a disk array device is a new recording medium simultaneously satisfying the three requirements for low price, high speed, and high reliability. The requirement that is most important and most difficult to maintain is high reliability. A single disk constituting a disk array is low in cost, and does not require high reliability. Accordingly, to realize a disk array device, high reliability must be maintained.
David A. Patterson and others at the University of California at Berkeley published reports in which disk array devices that provide redundancy of data by distributing a large volume of data to a number of disks at a high speed are classified from levels 1 to 5 (ACM SIGMOD Conference, Chicago, Ill., Jun. 1-3, 1988, pp.109-116).
The classification of disk array devices proposed by Patterson et al. is abbreviated as RAID (Redundant Array of Independent Disks). Next, brief descriptions are provided for RAID 0 to 5.
FIG. 32 shows a RAID 0 disk array device. In a RAID 0 disk array device, as shown by data A to I, a disk array control unit 10 distributes data to disk devices 32-1 to 32-3 according to an I/O request from a host computer 18, and data reliability for disk error is not insured.
A RAID 1 disk array device has, as shown in FIG. 33, a mirror disk device 32-2 in which copies Axe2x80x2 to Cxe2x80x2 of data A to C stored in the disk device 32-1 are stored. For RAID 1, use efficiency of the disk device is low, but data reliability is insured and can be realized with simple controls, resulting in this type of disk array device being widely used.
A RAID 2 disk array device stripes (divides) data in units of a bit or a byte, and concurrently executes data write or data read to and from each disk device. The striped data is recorded in the same physical sectors in all the disk devices. Hamming code generated from data is used as error correction code. The RAID 2 disk array device has, in addition to disk devices for data storage, a disk device for recording the Hamming code therein, and identifies a faulty disk from the Hamming code to restore data. By having data redundancy based on the Hamming code, data reliability can be insured, even if a disk device fails, but the use efficiency of disk devices is rather low, so that this type of disk array device has not been put into practical use.
A RAID 3 disk array device has the configuration as shown in FIG. 34. As shown in FIG. 35, for instance, data a, b, and c are divided by units of a bit or a sector to data a1 to a3, b1 to b3, and c1 to c3. Parity p1 is computed from the data a1 to a3, parity p2 is computed from the data b1 to b3, and parity p3 is computed from data c1 to c3. The disk devices 32-1 to 32-4 shown in FIG. 34 are concurrently accessed to write the data therein.
In a case of RAID 3, redundancy of data is maintained with parity. Further, a time required for data write can be reduced by concurrently processing the divided data. However, a concurrent seek operation is required for all the disk devices 32-1 to 32-4 for each access for data write or data read. This scheme is effective when a large volume of data is continuously treated. However, in the case of, for instance, transaction processing for accessing a small volume of data at random, the capability for high-speed data transfer cannot be effectively used, and efficiency is lowered.
A RAID 4 disk array device divides one piece of data by sector and then writes the divided data in the same disk device. For instance, as shown in FIG. 36, in the disk device 32-1, data a is divided into sector data a1 to a4 and the divided data is written therein. The parity is stored in a disk device 32-4 unequivocally decided. Herein parity p1 is computed from data a1, b1, and c1, parity p2 from data a2, b2, and c2, parity p3 from data a3, b3, and c3, and parity p4 from data a4, b4, and c4.
Data can concurrently be read from the disk devices 32-1 to 32-3. When reading data a, sector data a1 to a4 are successively read out and synthesized by accessing sectors 0 to 3 of the disk device 32-1. When writing data, data prior to write processing and the parity are read and then new parity is computed to write the data. Thus, the disk device 32-1 is accessed a total of 4 times for one write operation.
For instance, when sector data a1 in the disk device 32-1 is updated (rewritten), in addition to data write updating, operations are required for reading old data (a1 old) at an updated position and old parity (p1 old) of the corresponding disk device 32-4, computing new parity (p1 new) consistent with the new data (a1 new), and then writing the data.
Also, when writing data, the disk device 32-4 for parity is always accessed so that data cannot be simultaneously written in a plurality of disk devices. For instance, even if it is tried to simultaneously write data a1 in the disk device 32-1 and data b2 in the disk device 32-2, it is required to read the parities p1 and p2 from the same disk device 32-4 and then write the data after computing new parities. Thus the data cannot be simultaneously written in the disk devices.
RAID 4 is defined as described above, but this type of disk array device provides few merits, so there is no actual movement for introduction of this type of disk array device into practical use.
In a RAID 5 disk array device, a disk device is not dedicated for parity, so operations for data read and data write can be concurrently executed. As shown in FIG. 37, parities for sectors are written in different disk devices, respectively. Herein parity pl is computed from data a1, b1, and c1, parity p2 from data a2, b2, and d2, parity p3 from data a3, c3, and d3, and parity p4 from data b4, c4, and d4.
As for concurrent operations for data read and data write, for instance, data a1 for sector 0 of the disk device 32-1 and data b2 for sector 1 of the disk device 32-2 are placed in the disk devices 324 and 32-3 having parity p1 and parity p2 different from each other respectively, so that the operations for reading data and writing data can be concurrently executed. It should be noted that the overhead required for four accesses is the same as that for RAID 4.
As described above, for RAID 5, operations for data read and data write can be concurrently executed by accessing a plurality of disk devices asynchronously. Thus, this type of disk array device is suited to transaction processing executed by accessing a small volume of data at random.
In the conventional types of disk array devices described above, when the power supply is interrupted for some reason while data write to a disk device is being executed, system control can be started from the same operation for writing data after recovery of the power supply in the RAID 1 to RAID 3 disk array devices. However, the same write operation cannot be restarted after recovery of the power supply in the RAID 4 and RAID 5 disk array devices for the following reasons.
When writing data in a RAID 4 or a RAID 5 disk array device, parity is decided by computing an exclusive-OR (expressed by the exclusive-OR symbol) for data in a plurality of disk devices using the equation (1) below and the parity is stored in a disk device for parity.
Data a(+)data b(+) . . . =Parity Pxe2x80x83xe2x80x83(1)
Sites for storage of data and parity are fixed for RAID 4 to particular disks 32-1 to 32-4 as shown in FIG. 36. In contrast, for RAID 5, sites for storage of parity are distributed to the disk devices 32-1 to 32-4 as shown in FIG. 37 to dissolve concentration of access to a particular disk or particular disks due to operations for reading and writing parity.
When reading data from these RAID 4 and RAID 5 types of disk array devices, data in the disk devices 32-1 to 32-4 cannot be rewritten so that consistency of parity is maintained. Parity must be rewritten when writing data.
For instance, when old data (a1 old) in the disk device 32-1 is rewritten to new data (a1 new), parity p1 for all the data in the disk device can be maintained by updating parity using equation (2):
Old data(+)old parity(+)new data=New parityxe2x80x83xe2x80x83(2)
As shown by this equation (2), it is necessary to read out old data and old parity in the disk device first, and then an operation for writing new data and operations for generating and writing the new parity are executed.
Next, a detailed description is provided for a method of rewriting data in a RAID 5 type of disk array device with reference to FIG. 38. FIG. 38 illustrates a sequence for rewriting data. In FIG. 38, an array controller 50 is connected to 5 disk devices (Devices 0, 1, 2, 3, 4) 32-1, 32-2, 32-3, 32-4, and 32-5 to control the disk devices, and a host computer 18 is connected to the array controller 50 via a control unit 10 to control the array controller 50.
For instance, when rewriting data (D0) in the disk device 32-1, at first the control unit issues a write command to the array controller 50, and also transfers write data (D0 new) 40 to the array controller 50. The array controller 50 receives the write command from the control unit 10, and reads out old data (D0 old) 40-1 from the disk device 32-1. Also the array controller 50 reads out old parity (Dp old) from the disk device 32-5.
Then the array controller 50 writes the new data (D0 new) in the disk device 32-1. The array controller 50 computes an exclusive-OR (EOR) with a logic circuit 12 using old parity (DP old) 48, old data (D0) 40-1, and new data (D0 new) 40 to generate new parity (Dp new) 48-1, and writes the new parity in the disk device 32-5. Then the array controller 50 reports to the control unit 10 that the write operation finished normally, and the control unit 10 acknowledges the report, thus finishing data updating.
If power is interrupted while writing new data or new parity in a RAID 4 or a RAID 5 type of disk array device, it becomes impossible to determine where data has been written normally, and consistency of parity is lost. If the processing for writing the same data is executed after recovery of power, old data and old parity are read from a disk device or disk devices with consistency of parity having been lost therefrom, so that inconsistent parity is generated and the data write operation is disadvantageously finished.
To solve the problem described above, the present inventors proposed RAID 4 and RAID 5 types of disk array devices in which, even if power is interrupted during an operation for writing new data or new parity, the interrupted operation for writing the same data or same parity can be restarted (refer to Japanese Patent Laid-Open Publication No. HEI 6-119126). The disk array device according to this invention is shown in FIG. 39.
In this disk array device, at least processing state data 38 indicating a processing state of a writing unit 60, as well as a parity updating unit 70 and new data 40 transferred from an upper device 18, are stored in a nonvolatile memory 34 in preparation for a situation in which power is interrupted, and when power is restored, a restoring unit 80 executes the processing for recovery using the new data 40 maintained in the nonvolatile memory 34, with reference to the processing state data 38 in the nonvolatile memory 34 when the write processing has been interrupted.
However, a subsequent study showed that in the invention disclosed in Japanese Patent Laid-Open Publication No. HEI 6-119126, if any one of a plurality of disk devices fails, sometimes recovery processing cannot be executed. In the configuration shown in FIG. 38, for instance, if the disk device 32-2 is faulty, then when power is cut off and the operation for writing data is interrupted while rewriting new data (D0 new) or new parity (Dp new), not only data (D0) in the disk device 32-1 and parity (Dp) in the disk device 32-5 are broken, but it also becomes impossible to reconstruct data (D1) during data striping constituting the same parity group in the faulty disk device 32-2, resulting in the data being lost.
Also, it is conceivable that the invention disclosed in Japanese Patent Laid-Open Publication No. HEI 6-119126 is applied to a RAID 5 disk array device having a plurality of array controllers. A nonvolatile memory is provided in a disk array device having a plurality of array controllers, new data and processing state data are stored in the nonvolatile memory, and the processing for data recovery is executed when data write processing is not finished normally due to power failure, or for any other reason, using the data.
However, when a plurality of array controllers are booted up with independent power supply units, respectively, time delay is generated. For this reason, if a power supply is restarted after write data processing has not finished normally in a plurality of array controllers, the processing for recovery is executed to data in a parity group updated immediately after data recovery by an array controller using data stored in a nonvolatile memory in another array controller, and the last data is disadvantageously lost.
It is an object of the present invention to provide a disk array device which can restart, even if power goes down during data write processing, the interrupted data write processing after recovery of power to complete the processing, especially a disk array device in which data can be restored even if any of a plurality of disk devices is faulty, or a disk array device having a plurality of array controllers in which data can be restored.
FIG. 1 is an explanatory view showing an operational principle of a disk array device according to the present invention. As shown in FIG. 1, the disk array device belongs to the category of RAID 4 or RAID 5, and comprises a control unit 10, an array controller 50, and a plurality (for instance, 5 units in FIG. 1) of disk devices 32-1, 32-2, 32-3, 32-4, and 32-5.
Provided in the control unit 10 are a channel interface adapter 16, a nonvolatile memory 34, a special write executing unit 110, and a data reproducing unit. An upper device 18, such as a host computer, is connected via the channel interface adapter 16 to the disk array device. The nonvolatile memory 34 stores therein new data transferred from the upper device.
When the write processing is interrupted (for example, due to power failure) once and then restarted, the new parity cannot be generated because the old data and old parity become inconsistent. As a result, the special write executing unit 110 executes special write processing to perform data restoration. The new data has already been stored in the nonvolatile memory 34. The special write executing unit 110 reads the data (other data) from all the other disk devices excluding the specified data disk device to receive the new data and the parity disk device, generates new parity using the new data and the other data, and writes the new data in the specified disk device and the generated new parity in the parity disk device. The upper device 18 instructs the new data to be written over the old data in the specified disk device.
The special write executing unit 110 has a data write unit 113 and a parity generating unit 116. The data write unit 113 overwrites a preset special value, or preferably new data stored in the nonvolatile memory 34, when executing the special write processing, at a specified write position in the specified disk device (for instance, 32-1).
When executing the special write processing, the parity generating unit 116 generates the new parity using the new data stored in the nonvolatile memory 34 and the other data read MR from the other disk devices from a position corresponding to the position where the new data is to be written in the specified disk device 32-1. After generation of the new parity, the new data is written in the specified disk device 32-1 and the new parity is written in the parity disk device 32-5.
The data reproducing unit 120 issues a request to the special write executing unit 110, in effect to start the special write processing, when old data cannot be read out from the specified disk device and old parity cannot be read out from the parity disk device, because of an interruption in the write processing.
Provided in the array controller 50 are a plurality (for instance, 5 units in FIG. 1) of device interface adapters 54-1, 54-2, 54-3, 54-4, and 54-5. Data error detecting units 154-1, 154-2, 154-3, 154-4, and 154-5 are provided in the device interface adapters 54-1, 54-2, 54-3, 54-4, and 54-5, respectively. The data error detecting units 154-1, 154-2, 154-3, 154-4, and 154-5 detect generation of an error when reading out data from the disk devices 32-1, 32-2, 32-3, 32-4, and 32-5, and reports generation of the error to the data reproducing unit 120.
In a disk array device having the configuration described above, the processing for data recovery is executed as described below. After processing for writing new data is interrupted due to power failure or for other reasons and write processing is restarted because the power supply is restarted or for other reasons, an attempt is made to read the old parity, stored at a position corresponding to the disk write position for new data, from the disk device for parity (for instance, 32-5). In this step, a read error is detected by the data error detecting unit (for instance, 154-5) because consistency of parity has been lost due to interruption of the previous write processing.
Then the data error detecting unit (for instance, 154-5) reports the occurrence of an error to the data reproducing unit 120. When the data reproducing unit 120 receives the report, it reads out the other data, for generating the new parity, from the disk devices (for instance, 32-2, 32-3, 32-4) other than the specified disk device (for instance, 32-1) and the disk device for parity (for instance, 32-5) each belonging to the parity group in which the read error occurred.
When the special write executing unit 110 receives a request to shift to the special write processing mode, the data write unit 113 overwrites a preset special value, or preferably new data stored in the nonvolatile memory 34 at specified write positions in the specified disk device (for instance, 32-1).
The parity generating unit 116 generates new parity using data and parity stored at positions corresponding to specified write positions in a disk device (for instance, 32-1), which has been instructed to receive new data, as well as in a disk device for parity (for instance, 32-5), and writes the new parity in the disk device for parity (for instance, 32-5). Then the special write processing mode terminates.
It should be noted that, when a preset special value is overwritten at a specified write position in a specified disk device (for instance, 32-1, such as when new data is not stored in the nonvolatile memory 34), the data write unit 113 memorizes that the special value was overwritten, for example, by providing a flag in the memory, and reports a read error when a read request is issued to the data.
As described above, a disk array device according to the present invention is a disk array device for data updating by reading out old data stored at a write position of a specified disk device, then writing new data transferred from an upper device at the write position. A new parity is generated according to an old parity stored at a disk write position for the new data on a disk device for parity and the old data, as well as the new data, and the new parity is written at a disk storage position for the old parity. The disk array device comprises a nonvolatile memory for storing therein new data transferred from an upper device. A special write executing unit performs recovery processing where write processing is interrupted and then restarted, and it is impossible to restore parity because required data cannot be normally read out from the parity disk device or the specified disk device. A new parity is generated by using (1) other data stored at a position corresponding to a disk write position for the new data on the disk devices other than the specified disk device and parity disk device and (2) new data stored in the nonvolatile memory.
With the disk array device according to the present invention, when write processing that has been interrupted due to power failure or for some other reason is restarted, data recovery processing is executed, even if the specified disk device or the parity disk device is faulty, by generating new parity (Dp new) using (1) other data (D1, D2, D3) stored at positions corresponding to disk write positions for new data (D0 new) in the other disk devices and (2) new data (D0 new) stored in the nonvolatile memory.
In the disk array device of the present invention, the new parity is generated from the data stored at positions corresponding to disk write positions for the new data on all disk devices other than the specified disk device and the parity disk device, and the generated new parity is stored in the nonvolatile memory. Furthermore, the special write executing unit 110 concurrently writes the new data stored in the nonvolatile memory into the specified disk device, and generated new parity into the parity disk device.
A disk array device according to the present invention is also characterized in that a write flag indicating that write processing is being executed and management information indicating progression of the write processing are stored in the nonvolatile memory from the time when a write processing instruction is received from an upper device until the write operation finishes in the normal state.
With the disk array device according to the present invention, a write flag indicating whether an operation for writing data into a disk device has finished normally and a status indicating a stage of the write processing are stored in the nonvolatile memory. If the write processing has not finished normally and then the power supply is restored, whether any data not having been written in the normal state is remaining can visually and easily be checked by referring to the write flag. Also, recovery processing can be restarted from the point where write processing was interrupted by referring to the status, so that recovery processing can be rapidly executed.
A disk array device according to another embodiment of the present invention is a disk array device comprising a plurality of array controllers, each driven by an independent power supply unit for writing and reading data and parity to and from a plurality of disk devices. A control unit controls the array controllers and executes data updating by first reading out old data stored at a write position on a specified disk device. Then, the control unit writes new data transferred from an upper device at the write position. The control unit also writes a new parity in a parity disk device at disk storage positions for the old parity. The new parity is generated according to an old parity, old data, and new data read from storage positions corresponding to disk write positions for the new data.
The control unit comprises a nonvolatile memory for storing therein at least the new data, old data, and old parity, when an upper device provides an instruction for write processing to a disk device. When a power supply is cut off to one of the array controllers, a task generating unit within the control unit generates a task for allocating the write processing being executed or to be executed by this array controller to other array controllers. The control unit also contains a task information table for storing therein the task generated by the task generating unit.
Each of the array controllers comprises a power monitoring unit for mutually monitoring the power supply state and a power supply stop reporting unit for reporting to the control unit that stoppage of the power supply to other array controller or controllers has been detected. The array controllers also contain a parity generating unit for generating a new parity according to (1) data read from a storage position corresponding to a disk write position for the new data on all disks, excluding the disk device in which it has been specified to write new data and the disk device for parity, and (2) new data transferred from the nonvolatile memory.
With the disk array device according to the present invention, when a write instruction is issued from an upper device, new data (D0 new), old data (D0 old), and old parity (Dp old) are stored in nonvolatile memory prior to execution of the write processing to a disk device. Thus, when a problem occurs in the write processing by one of the array controllers, another array controller can continue the write processing instead of the faulty array controller, thereby maintaining the consistency of the data.
A disk array device according to the present invention is also characterized in that management information indicating progression of write processing is stored in the nonvolatile memory, and the task generating unit generates a task according to the management information stored in the nonvolatile memory.
With the disk array device according to the present invention, a status indicating a stage of the write processing and an ID flag indicating an array controller having executed the process indicated by the status are stored in nonvolatile memory, and a task for alternative processing is generated according to the status so that the write processing can be restarted from the interrupted point.
A disk array device according to another embodiment of the present invention is a disk array device comprising a plurality of array controllers, each driven by an independent power supply unit for writing and reading data and parity to and from a plurality of disk devices. A control unit controls the array controllers, and executes data updating by first reading out old data stored at a write position on a specified disk device. Then, the control unit writes new data transferred from an upper device at the write position. The control unit also writes a new parity in a parity disk device at disk storage positions for the old parity. The new parity is generated according to an old parity, old data, and new data read from storage positions corresponding to disk write positions for the new data.
Each of the plurality of array controllers comprises a nonvolatile memory for storing, when an upper device provides an instruction for write processing to a disk device, at least the new data, old data, and old parity. A communicating unit within each array controller executes a transaction of data and parity with another array controller. The communicating unit transmits, when the new data, old data, and old parity have been stored in the nonvolatile memory in one of the array controllers, the new data, old data, and old parity stored in the nonvolatile memory from the one array controller to the other array controller before write processing is executed to a disk device. The communicating unit also receives the new data, old data, and old parity sent from the one array controller to the other array controller and stores them in the nonvolatile memory of the other array controller.
With the disk array device according to the present invention, when an upper device issues an instruction for write processing, new data (D0 new), old data (D0 old), old parity (Dp old) and new parity (Dp new) are stored in the nonvolatile memory of one of the array controllers before execution of the write processing to a disk device. New data (D0 new), old data (D0 old), and old parity (Dp old) are copied into a nonvolatile memory of another array controller so that, even if the processing for writing data and parity is not finished in the normal status due to power failure or for some other reason, recovery processing can easily be executed when power supply is restarted by using new data (D0 new) stored in a nonvolatile memory in one of the array controllers or in the other one.
A disk array device according to the present invention is also characterized in that management information indicating progression of write processing is stored in the nonvolatile memory.
With the disk array device according to the present invention, a status indicating a stage of write processing is stored in the nonvolatile memory so that when write processing is not finished in the normal state and then the power supply is restarted, the write processing can be restarted from the interrupted point by referring to the status.
A disk array device according to the present invention is also characterized in that, when write processing is interrupted in one of the array controllers and then the array controller interrupted as described above is restored to a stable state allowing normal operation, the interrupted array controller, or the other array controller having received the new data, old data, and old parity from the interrupted array controller before interruption of write processing, executes the interrupted write processing again according to the new data, old data, and old parity stored in nonvolatile memory.
With the disk array device according to the present invention, interrupted write processing is restarted according to new data (D0 new), old data (D0 old), and old parity (Dp old) stored in nonvolatile memory so that recovery processing can be easily executed.
A disk array device according to another embodiment of the present invention is a disk array device comprising a plurality of disk devices and an array controller for writing and reading data and parity to and from the disk devices. Data is updated by reading old data stored at a write position of a specified disk device and then writing new data transferred from an upper device at the write position. A new parity is generated according to an old parity, old data, and new data read from a storage position corresponding to a disk write position for the new data, and the new parity is written on a disk device for parity at a disk storage position for the old parity. The disk array device further comprises a non-failure power supply unit for backing up power supply to the plurality of disk devices, as well as supplying power to the array controller.
With the disk array device according to the present invention, even when AC input to a power supply unit is stopped, or when power supply between a power supply unit and an array controller or that between a power supply unit and a disk device is down for some reason, power supply is continuous so that write processing by an array controller is not interrupted and consistency of data is maintained.
A disk array device according to another embodiment of the present invention updates data by first reading out old data stored at a write position of a specified disk device and new data transferred from an upper device at the write position. A new parity generated according to an old parity, old data, and new data stored at a write position corresponding to the disk write position for the new data is written on a parity disk device at the disk storage position for the old parity. The disk array device further comprises a special write executing unit for executing processing recovery. Processing recovery occurs when data in at least two disk units cannot be read out normally for a data group serving as a basis for parity. Then, data is arbitrarily written in the two disk devices from which data cannot be read out normally, and a new parity is generated using the data arbitrarily written and data normally read out from the data group as a basis for a parity. The disk array device also comprises a data error detecting unit for issuing a data check response to read the data arbitrarily written by the special write executing unit.
With the disk array device according to the present invention, although data written in a disk device from which data can normally be read out cannot be reproduced, generating new parity by writing arbitrary data in the disk device, enables the disk device to be operated normally according to a write instruction from an upper device. For this reason, recovery processing from an upper device can be executed.
Also, with the disk array device according to the present invention, by memorizing that arbitrary data has been written at a place where unreadable data is stored in a disk device from which data cannot normally be read out, and also by sending an error or the like in response to a read instruction from an upper device for the written arbitrary data, it is possible to prevent the arbitrary data from erroneously being sent to the upper device.