1. Field of the Invention
The present invention relates to an arrayed recording apparatus which is a memory system of a computer. More particularly, it relates to an improvement of the efficiency and the reliability of a disk drive system comprising arrayed disk drives.
2. Description of the Related Art
Various papers and patents concerning a disk system comprising arrayed disk drives have been published. In this literature there is a paper from University of California at Berkeley concerning a system that dramatically improves the reliability of a large amount of stored data: "A Case for Redundant Arrays of Inexpensive Disks (RAID)", Proc, ACM SIGMOD Conf., Chicago, Ill., June 1988. This paper classifies the system for improving data reliability into five levels, from the conventional mirrored-disk system to the block-interleaved-parity system. Summaries of them are as follows.
RAID level 1
This is the normal mirrored (shadowed) system, in which the same data are stored in two groups of disk drives. Systems at RAID level 1 have been conventionally used in the systems which require high reliability. However, their cost per unit of capacity is high because of the great redundancy.
RAID level 2
Data are bit-interleaved in data disks of redundant groups, using a Hamming code in DRAM. ECC codes are written in a plurality of check disks per group to be able to correct a bit error. A group comprises about 10 to 25 data disks; for example, four check disks are needed for ten data disks. The redundancy is large to some extent.
RAID level 3
Data are byte-interleaved into a data disk of the group, using a specific parity disk. Only one parity disk is needed since positions of errors are detected from ECC at each drive. This level is suitable for a synchronous rotating of a spindle to transfer data at high speed.
RAID level 4
Data are block-interleaved into a data disk of the group, using a specific parity disk. This level is suitable when there are frequent access to a small amount of data since the block is used as an interleaving unit for data storing, which differs from the level 3.
RAID level 5
Having no specific parity disk, which differs from the level 3 and 4, the parity data are distributed and stored evenly in all disks (striping). Accordingly, at the writing time data accesses are not concentrated on a single parity disk drive, so that IOPS is increased. (This level is more effective than the level 4 when the ratio of writing is high.) Both the high processing performance and the improved memory efficiency provided by this level are good.
"Arrayed recording apparatus system and method" by Array Technology Corporation in U.S.A., disclosed in the Unexamined Japanese Patent Publication No. 2-236714 is known as an example of a conventional redundant arrayed recording apparatus. In this Array Technology Corporation case, It is possible to select the redundancy level and the logical number of disks in the disk drive configuration recognized and accessed by the host computer.
The system of distributing and storing the parity data evenly in all disks (striping) is disclosed in the Unexamined Japanese Patent Publication No. 62-293355, "Data Protection System" by International Business Machines Corporation in U.S.A..
FIG. 31 shows a configuration of the conventional arrayed recording apparatus disclosed in the Unexamined Japanese Patent Publication No. 2-236714. FIG. 31 is composed of the following. A host Interface 2 (called a host I/F, hereinafter) is a buffer between a host computer 1 and an array controller 13. A microprocessor 3 controls the array controller. An EOR engine 5 generates redundant data and recovers data. A data bus 6 interconnects the host I/F 2, the microprocessor 3, a memory 4 and the EOR engine 5. ACE panel 7 and plural channel controllers 8a . . . . 8e are also connected to the data bus 6. The array controller controls a plurality of disk drives 9a, . . . , 9e. The disk drives 9a, . . . , 9e are connected to the channel controllers 8a, . . . , 8e through a channel 10.
FIG. 32 explains a generation of the redundant data in a RAID. As shown in the figure, data of one disk out of five disks stores the redundant data (parity) of the other four disks. The parity can be obtained by calculating the exclusive OR (XOR) of the data of the four disks. That is, parity data of the parity disk P is obtained from the data of a disk 0, a disk 1, a disk 2 and a disk 3 by calculating their XOR. It is possible to recover data by having such a parity as the redundant data. For instance, when the data of the disk 0 can not be read out because of some obstacle, the data of the disk 0 can be recovered by using the data obtained from calculating the exclusive OR (XOR) of data of the disks 1, 2, 3 and the parity disk P.
In addition to the above method of calculating the parity by calculating XOR of the data of the four disks, there is another method as follows. By reading old data of the disk where data will be written and current parity data stored in the parity disk, then calculating the XOR of the new data, the old data and the current parity data, new parity data can be obtained. This method will be described with respect to FIG. 33. For instance, in the case of new data DN(2) being written into the disk 2, old data is read from the disk 2 as data DO(2) first. Simultaneously, current parity data DO(P) are read from the parity disk. Then, new parity data DN(P) are obtained by calculating XOR of the three data DN(2), DO(2), DO(P). The new data DN(2) are stored in the disk 2. Finally, the new parity data DN(P) are stored in the parity disk.
The operation of the system shown in FIG. 31 will be described. In FIG. 31, all of storing requests and recovering requests for data from the host computer 1 are carried out through the host I/F 2. When data are stored, commands and data from the host computer 1 are temporarily stored in the memory 4 via the data bus 6. When data are recovered, data stored in the memory 4 are sent to the host computer I through the host I/F 2.
Operation at RAID level 5 will be described with respect to FIGS. 31 and 34. When data are stored, the microprocessor 3 divides data stored in the memory 4 into data blocks, and determines disk drives in which the data will be written and in which redundant data will be written. At RAID level 5, the old data from the data blocks to be written are necessary for updating the redundant data, so a read operation is carried out before the write. Data are transferred between the memory 4 and channel controller 8 on the data bus 6, and redundant data are generated by the EOR engine 5 in synchronization with this data transfer.
As shown in FIG. 34, it is assumed that the data block is set at 512 bytes for example. Blocks which store the parity are distributed to each disk drive as shown P1, P2, P3, . . . Such recording state is called a striping. D11, D21, D31, D41 and P1 are called a redundant group. D12, D22, D32, P2 and D51 are also called a redundant group.
In writing of 1024 bytes of data, the data are divided and stored into two blocks D11 and D21. The parity data P1 are also stored. This process will be explained in order as follows. First, disk drives 9a and 9b in FIG. 31 are selected for writing the data, and disk drive 9e is selected for redundant data. The EOR engine 5 is started under control of the microprocessor 3, and the channel controllers 8a, 8b, and 8e connected to the data disk drives 9a and 9b and the redundant data disk drive 9e are commanded to read old data under control of the microprocessor 3, in order to calculate redundant data. After the reading of old data from the above data disk drives 9a and 9b and redundant data disk drive 9e has been completed, the writing of new data in the data disk drives 9a and 9b and the writing of updated redundant data generated by the EOR engine 5 in the redundant data disk drive 9e are carried out at the direction of the microprocessor 3. As stated above, the process takes a long time because when data are written it is first necessary to read old data in order to generate redundant data.
Next the reading of data will be described. When the reading of data is requested by the host computer 1, the microprocessor 3 calculates the data blocks and data disk drives in which the data are stored. If the data are stored in a disk drive 9c, for example, a read command is issued to a channel controller 8c connected to the disk drive 9c. When the reading of data from the disk drive 9c has been completed, the data are transferred to the memory 4 and the host computer 1 is notified that the reading of data has been completed.
Data recovering and data reproduction onto the standby disk when a malfunction occurs will be described. Data recovering is carried out when it has become impossible to read data from the disk drive 9c, for example. When it is impossible to read data from the disk drive 9c, data reading from all the disk drives of the redundant group including data block concerned for reading, is carried out by the microprocessor 3 and data of the data block where no data could be read out is recovered by the EOR engine 5.
For example, when the redundant group is composed of the disk drives 9a, 9b, 9c, 9d and 9e, data block is read out from the disk drives, 9a, 9b, 9d and 9e. Then, data of the disk drive 9c is recovered by the EOR engine 5, the data are transferred to the memory 4, then the host computer 1 is notified that reading data has been completed.
As stated above, it is possible to recover data even when reading has become impossible because of a malfunction in the disk drive. Accordingly, the reliability of data is improved.