The contents of 2000-167484, filed Jun. 5, 2000 in Japan, are incorporated herein by reference.
1. Field of the Invention
The present invention relates to disk array devices including multiple disk devices equipped with a restorative function used when the disk devices fail.
2. Description of the Related Art
In recent years, devices have been made available in which subsystem components are multiplexed and have a degree of redundancy. A localized trouble-shooting function and the redundant configuration of these devices increases the continuous usability of these devices and allows for the automatic restoration of data when a disk fails. The data redundancy method is categorized into 6 stages ranging from RAID 0 to RAID 5 (RAID is the Redundant Array of Inexpensive Disks).
FIG. 14(a) shows a schematic diagram of a RAID 4 System 100. As shown in FIG. 14(a), RAID 4 employs a parity system for data restoration information. The RAID 4 System 100 shown in FIG. 14(a) includes several data disks D0, D2, . . . storing the data allocated in multiple read/write units, a parity-generating unit P and a disk device DP which stores the parity.
In the RAID 4 System 100 shown in FIG. 14(a), the data is allocated into multiple units designated by A0, A2, . . . . Generally, these data units A0, A2, . . . are fixed lengths. The allocated data A0, A2, . . . are distributed to and stored on the data disks D0, D2, . . . while the parity is stored on the dedicated disk DP. In the following description, the data that is allocated to different disks and stored as above is referred to as redundant identical group data or simply as redundant group data. The disk groups that this data is stored on are referred to as redundant group disks. The parity can also be referred to as redundant data.
When there is a problem with a disk, the data on the disk is regenerated from the remaining identical group data and parity (redundant data).
RAID 4 is capable of reading out multiple data simultaneously but cannot write multiple data at the same time. When updating data, the RAID 4 System 100 always reads the parity and the data before the update and writes after creating the update parity, which requires additional access. This is referred to as a write penalty.
FIG. 14(b) shows a schematic diagram of a RAID 5 System 200. Like the RAID 4 System 100, the RAID 5 System 200 employs a parity system for data restoration information. The RAID 5 System 200 also includes multiple disks D1, D2, . . . for storing parity, and a parity-generating unit P.
In the RAID 5 System 200, the data is divided into several groups as shown in FIG. 14(b), including A0, A1, . . . B0, B1, . . . The groups of divided data are distributed to disks D1, D2, . . . respectively and stored therein. The parity PA, of the data A0, A1 . . . and the parity PB of the data B0, B1 . . . are distributed to the disks D1, D2, D3 . . . and stored.
In the RAID 5 System 200, as with the RAID 4 System 100, when there is a problem with a disk, the data on the disk is regenerated from the aforementioned identical group data and parity (parity data).
RAID 5 is capable of reading and writing multiple disks simultaneously. When updating the data, there is the aforementioned write penalty. Also, while updating the parity, no read/write access is allowed to the disk.
FIG. 15 shows a diagram with an example of the write sequence in a disk array device to which the aforementioned RAID 4 System 100 or RAID 5 System 200 could be applied. The example of the write sequence shown in FIG. 15 corresponds to the RAID 4 System 100 and is explained with reference to the RAID 4 System 100.
As shown in FIG. 15, the RAID 4 System 100 includes a subsystem control module 101, which includes subsystem control module internal memory 101a. Also shown in FIG. 15, the RAID 4 System 100 includes subsystem internal interface module 102 (hereafter, xe2x80x9cinterfacexe2x80x9d is abbreviated to I/F), device control module 103, buffer 103a, device I/F module 104, disk group 105, data disks D0xcx9cD2, and redundant disk P storing redundant data.
Referring now to FIG. 15, OD (Old Data) is the data that is to be updated (referred to as old data below), OP (Old Parity) is the parity data to be updated (referred to as old parity below), ND (New Data) is the write data, NP (New Parity) is the write redundant data (referred to as new parity below) and IP is the interim parity data.
As shown in FIG. 15, the write operation in the disk array device 100 is carried out as explained herein below. (Items (a)xcx9c(g) in FIG. 15 correspond, respectively, to items (a)xcx9c(g) below.)
(a) The write data ND1 is transferred from the memory 101a of the subsystem control module 101 to the buffer 103a of the device control module 103.
(b) The data OD1 on the disk that is to be written to is read into the buffer 103a. 
(c) The redundant data OP of the redundant group of the data to be written is read into the buffer 103a. 
(d) The interim redundant data IP is generated by performing an exclusive xe2x80x9corxe2x80x9d operation on OD1 and OP.
(e) The new redundant data NP is generated by performing an exclusive xe2x80x9corxe2x80x9d operation on ND1 and IP.
(f) ND1 is written to the disk 105.
(g) NP is written to the disk 105.
For this sequence, items (a)xcx9c(c) and (e)xcx9c(f) do not have to be performed in any strictly fixed order.
The following types of methods are possible for maintaining the reliability of the data when there has been a momentary interruption due to a power outage or other reason in systems that perform the above sort of write operation:
(1) Continuous subsystem operation by means of a battery back-up system for the entire device.
(2) Write data support based on non-volatile memory.
In (1) above, when the power supply supplied to the device is cut off, the data is secured by the continuous operation of a subsystem. However, in (1) above, a large-capacity battery is required to back up the entire subsystem and in actual installations, the percent that this occupies is extremely large.
In (2) above, the write data remains in memory which nearly always makes recovery possible by writing to the disk again when the power supply is turned back on. Also in (2) above, if the write was being carried out before power supply was cut off and the redundant data was being written, that RAID would be in degeneration mode (at least when one disk had failed). Then when the power supply was turned on again, when the RAID shifted into degeneration mode, the redundancy of that redundant group would be lost and it would not be possible (since it would not be performed properly) to restore the data on the broken disk or to write the data of that redundant group. This state is referred to as a xe2x80x9cWrite Holexe2x80x9d.
To correct this sort of problem, the redundant data stored in the memory 101a is managed constantly and the status of the write progress is written to the memory 101a. That progress status is used with the redundant data to perform the recovery. Avoiding the above state has also been considered, but the need to constantly transfer redundant data while writing led to a drop in performance.
The present invention solves the above-mentioned problems.
An object of the present invention is to provide a disk array device that maintains reliability of data without too great a loss of performance, even in degeneration mode, as well as when the power supply is turned back on after being turned off.
The present invention comprises a disk array device including a subsystem control module, disk, and a device control module. The subsystem control module comprises a memory backed up with a battery. The disks store data and/or parity. The device control module controls the disks. The device control module comprises a buffer storing redundant data, wherein when data is to be written to the disks, the disk array device allocates and writes the data to the respective disks, generates redundant data from the allocated data, and writes the redundant data onto disks of the disks not storing the allocated data, and wherein when writing data to the disks, the data written to the memory of the subsystem control module is held until a writing process is completed and when at least one of the disks is broken or degenerated, the disk array device transfers and stores the redundant data stored in the buffer of the device control module to the memory of the subsystem control module.
Further, the present invention comprises a disk array device coupled to and receiving data from a host computer. The disk array device of the present invention comprises a subsystem control module and a device control module. The subsystem control module of the present invention comprises a memory having a backup power supply. The memory stores data redundant to the data received from the host computer. The device control module of the present invention is in communication with the subsystem control module. The device control module interfaces to and controls access of a disk drive group storing data and parity on disk drives. The memory of the subsystem control module stores the redundant data until the device control module notifies the subsystem control module that the device control module has successfully written the data received from the host computer to the disk drive group.
In addition, the present invention comprises a method a disk array device controlling disks storing data and coupled to a host computer. The method of the present invention comprises storing data redundant to data received from the host computer, in a memory, having a backup power supply, of the disk array device until the data received from the host computer is successfully written to the disks by the disk array device.
These together with other objects and advantages which will be subsequently apparent, reside in the details of construction and operation as more fully hereinafter described and claimed, reference being had to the accompanying drawings forming a part hereof, wherein like numerals refer to like parts throughout.