The present invention relates to a disk-array controller for a mass magnetic disk storage, a mass optical disk storage, or the like, and especially to a disk-array controller having an array input/output control unit for a plurality of disk units of a computer system.
Heretofore, a disk-array controller of this nature has been used for improving the system performance and the cost performance. Here, a plurality of inexpensive small disk units are substituted for a single expensive large disk unit and a set of the small disk units can be appeared as an expensive large high-speed disk unit.
There are several methods for designing a disk array, for example, David A. Patterson, Garth Gibson and Randy H. Katz, "A case for Redundant Arrays of Inexpensive Disks", California University report No. UCB/CSD 87/391 (December, 1987).
In this article, methods of constructing disks including redundancies are classified into five groups as different levels of RAID (Redundant Arrays of Inexpensive Disks) systems. As an example of such levels, an array of five disk units is described in the above article.
A first-level RAID (RAID 1) system uses disk units (N in total) for storing data and mirror disk units (N in total) for mirroring the data. The RAID 1 system stores duplicated copies of information obtained by writing the information duplicatively on data disk units.
A second-level RAID (RAID 2) system provides a configuration of redundant disk using Hamming code which is one of ECCs (error-correcting codes), for example including four data disk units and three ECC disk units. However, the RAID 2 system has been hardly implemented because of its high level of redundancy.
A third-level RAID (RAID 3) system comprises a group (rank) of disk units (N+1 in total). In this case, each of data blocks is divided into N chunks. Then, the divided chunks are distributed across different data disk units (N in total) and stored therein. For reading the data without any loss when one of the disk units is troubled, parity information that corresponds to each of the divided chunks is also stored in a dedicated parity disk unit.
A fourth-level RAID (RAID 4) system also comprises a group of disk units (N+1 in total) but it is different from the RAID 3 system in that the RAID 4 system stores data in data disk units in a manner that the data to be stored in the corresponding data disk unit are divided into any data blocks and then the blocks are stored in the corresponding data unit without spreading across several disks. Thus, only one disk unit is used at the time of reading the data, so that a total throughput is increased by having independently accesses to the disk units if a data-transfer rate is short. The RAID 4 system further comprises a parity disk unit as in the case with the RAID 3 system. At the time of writing, however, four steps are required for updating parity information. Thus, the parity disk unit can be accessed whenever any of the data disk units are updated, so that it tends to become a bottleneck in writing. Accordingly, there have been a small number of reported cases of using the RAID 4 system.
A fifth-level RAID (RAID 5) system has almost the same configuration as that of the RAID 4 system, except the difference in handling the parity information. In the RAID 5 system, parity information is not concentrated in one disk unit but distributed across the disk units (N+1 in total) to correspond with the distribution of data. In the above article, an example of the RAID 5 system is disclosed but there is a disadvantage in which a parity update requires four steps. In the example, a write operation is performed on a nonvolatile memory unit to indicate the completion of write operation to the host and then the parity is substantially updated at a later spare time.
Regarding the above RAID 1-5 systems, for example, Japanese Patent Application Laid-Open No. 6-180623 (1994) discloses a multiplex data bus architecture for a disk-array controller. FIG. 20 is a block diagram that shows an example of the conventional disk-array controller. The disk-array controller comprises a data bus architecture which can be adjusted to execute a data-transfer operation between a host device and a plurality of disk units. The disk units are arranged as a disk array with RAID 1, 3, 4, or 5.
By way of multiplexers 135-140, an exclusive-OR gate circuit 134 (hereinafter, also referred as an XOR circuit), which is provided as a circuit for generating parity, can receive:
data from the host device through double registers 110-114 and a host SCSI adapter 143 which is provided as a DMA interface for communicating between a host device system data pass 144 connecting to the host device and SCSI data passes 141, 142; PA1 data from the disk units through SCSI bus interface chips 128-132 connecting their respective disk units; and PA1 data from a static RAM (SRAM) 133. PA1 the number of the XOR circuit (the parity-generating circuit) is one; PA1 the number of inputs to the XOR circuit is the sum of the number of the SCSI buses on the side of the disk units and the number of the SRAM; and PA1 an inability to generate feedback from an output of the XOR circuit to an input of the XOR circuit. This inability is occurred because the settings described above cannot changed during the transmission as pointed out as the first problem, and if the out of the XOR circuit is adjusted to generate feedback on the input of the XOR circuit the feedback continues without termination. PA1 data buffers accessible from any of a host device and disk units; PA1 exclusive-OR circuits for carrying out exclusive-OR operations; PA1 circuits for checking redundant parity, where outputs of the exclusive-OR circuits are determined as true when the outputs are all zero "0"; PA1 selectors for selecting outputs from the host devices or the disk units and outputs from the exclusive-OR circuits and transferring selected outputs to the data buffers; PA1 selectors for selecting whether or not to transport outputs from each of the data buffers to the exclusive-OR circuits; PA1 selectors for selecting whether or not to transport outputs from the host device or the disk units; PA1 selectors for selecting whether or not to transport all one "1" to the exclusive-OR circuits to reverse outputs of the XOR circuit; and PA1 transfer buses for transferring outputs of the exclusive-OR circuits to the host device and the disk units, PA1 the transfer buses are determined by adjusting a selection of the selectors at the time of transmission. PA1 a plurality of set registers for retaining read/write execution signals of each of the data buffers, select signals of each of the selectors, and a setting whether or not to check the redundant parity every time transmission occurs; and PA1 execution counters for retaining an execution position of such a setting, wherein PA1 settings of the transfer buses are preliminary programmed, PA1 local buses that perform time sharing processing are used as PCI buses, and PA1 the transfer buses are adjusted in accordance with the program during a period of acquiring access of the local busses. PA1 a first bus connecting to the host device; PA1 a second bus connecting to a plurality of the disk units; PA1 a plurality of data buffers connecting between the first bus and the second bus through input selectors and output selectors; PA1 a first exclusive-OR circuit in which output of the output selector, output of the first bus, output of a selector which outputs odd or even are provided as inputs; PA1 a second exclusive-OR circuit in which output of the output selector, output of the second bus, output of a selector which outputs odd or even are provided as inputs; and PA1 a control unit for controlling the selections of the input selectors and the output selectors, wherein PA1 outputs of the first exclusive-OR circuit and the second exclusive-OR circuit are stored in any of a plurality of the data buffers. PA1 the control unit controls the selections of the input selectors and the output selectors with reference to a RAID level and the first bus, the second bus are PCI buses, and a SCSI interface chip is provided between the host device and the first bus, and also SCSI interface chips are provided between a plurality of the disk units and the second bus; PA1 the input selectors connecting to the first bus and output of the first exclusive-OR circuit produce outputs of data to any of the data buffers under the control of the control unit, outputs of the data buffers enter into the first exclusive-OR circuit through the output selectors under the control of the control unit; PA1 the input selectors connecting to the second bus and output of the second exclusive-OR circuit produce outputs of data to any of the data buffers under the control of the control unit, outputs of the data buffers enter into the second exclusive-OR circuit through the output selectors under the control of the control unit; PA1 the input selectors connecting to the first bus and output of the first exclusive-OR circuit produce outputs of data to any of the data buffers under the control of the control unit, outputs of the data buffers enter into the second exclusive-OR circuit through the output selectors under the control of the control unit to supply it to the second bus for storing it into one of the plurality of disk units; and PA1 the input selectors connecting to the second bus and output of the second exclusive-OR circuit produce outputs of data to any of the data buffers under the control of the control unit, outputs of the data buffers enter into the first exclusive-OR circuit through the output selectors under the control of the control unit to input it to the host devices.
Furthermore, outputs from the XOR circuit 134 can be transferred to: the host device through double registers 110-114 and the host SCSI adapter 143; the disk units through three-state buffers 115-119 and SCSI bus interface chips 128-132; and the SRAM 133 through a three-state buffer 120. Therefore, a series of data passes from the host device to the disk units can be provided by setting whether each of those passes is to be used or not, independently.
A writing operation with the RAID 5 system will be described as an example of using the disk-array controller shown in FIG. 20. In the writing operation with the RAID 5 system, both of read and write procedures are involved. More concretely, old data and old parity are read out, while new data and new parity are written. The operation will be described on the assumption that the data is written to the disk unit of channel 2 while the parity information is updated on the disk unit of channel 1. Initially, information is read out from each of the disk units of channels 1 and 2 and then provided to the XOR circuit 134 (i.e., parity-generating circuit) through the multiplexers 135, 136. An output from the XOR circuit 134 is stored in the external SRAM 133 through a bus 126, an available three-state buffer 120, and a bus 127. Subsequently, the double register 110 receives new data from the host device and then the new data is written on the disk unit of channel 2 through a bus 122 and the SCSI bus interface. The new data is also provided to the XOR circuit 134. The information written in the SRAM 133 is also provided to the XOR circuit 134 through the multiplexer 140. An output of XOR circuit (i.e., the parity-generating circuit) 134 is new parity. The parity can be provided to the disk units through the available three-state buffer 115, a bus 121, and the available SCSI bus interface chip 128.
However, the process described above has the following problems. The first problem is that multiple different transmissions cannot be carried out in parallel. The data transmission to the host device is carried out without any break, so that the settings of each of the double registers, the three-state buffers, the multiplexers, and so on should be in a fixed state during the data transmission. If each of the settings is changed in the course of the data transmission, the data does not flow correctly. Notably, only one XOR circuit (i.e., one parity-generating circuit) is equipped in the system. If data transmissions using the XOR circuit are concurrently started, each of the data transmissions may be occurred in succession.
In addition, the second problem is that an XOR calculation cannot be performed on data in one operation if the number of data exceeds the number of inputs of the XOR circuit (the parity-generating circuit). The XOR operation is a necessity for data restoration or the like in a case where the group includes many disk units, so that if the XOR calculations are divided and executed in sequence it takes much time to complete the XOR operation due to the following facts: