(1) Field of the Invention
This invention relates to systems in which multiple controllers are used to control an array of storage devices.
(2) Description of Related Art Including Information Disclosed Under 37 CFR 1.97 and 37 CFR 1.98.
The acronym RAID refers to systems which combine disk drives for the storage of large amounts of data. In RAID systems the data is recorded by dividing each disk into stripes, while the data are interleaved so the combined storage space consists of stripes from each disk. RAID systems fall under 5 different architectures, plus one additional type, RAID-0, which is simply an array of disks and does not offer any fault tolerance. RAID 1-5 systems use various combinations of redundancy, spare disks, and parity analysis to achieve conservation reading and writing of data in the face of one and, in some cases, multiple intermediate or permanent disk failures. Ridge, P. M. The Book Of SCSI: A Guide For Adventurers. Daly City Calif. No Starch Press. 1995p. 323-329. In this application, a RAID system consisting of one host computer, one controller, and an array of multiple channels, each channel consisting of several direct access storage devices in serial electrical connection, will be termed a "single RAID subsystem".
Conventional RAID systems guard against failure of a controller by the active-active system. This system consists of two single RAID subsystems, each with a host computer, a controller, and an array of direct access storage units. The direct access storage units, in the most common case, disks, are arranged in channels in which the disks are connected in a series. A common arrangement is for one controller to control six channels of five disks in each channel. In the active-active system, each channel of one system is connected electrically to another channel in another system. This means that, in the event of the failure of one controller, the other controller can serve all 10 disks in each "double" channel. Unfortunately, during normal operation when both controllers are operating there is interference associated with the fact that two controllers are simultaneously accessing a double channel of ten disks. This interference reduces the speed of a normally acting active-active system to about 130% of the speed of a single RAID subsystem rather than the 200% of a single RAID subsystem expected from the operation of two single RAID subsystems.
U.S. Pat. No. 5,768,623 discloses a system for storing data for several host computers an several storage arrays which are linked so that each storage array can be accessed by any host computer. The system uses dual ported disks and involves serial communication channels. No switches or repeaters are used to isolate the disk arrays during normal functioning of host computer and storage array controllers.
U.S. Pat. No. 5,729,763 discloses a system for storing data in which each of a number of disk interfaces is coupled to a corresponding disk drive by unidirectional channels. Each disk interface includes a unidirectional switch. Use of the switches allows a defective disk drive or switch to be removed without requiring shut-down of the entire system.
The RAID systems of the prior art do not provide the advantages of the present invention, that of increasing the overall speed of N same-speed single RAID subsystems to N times the speed of a single RAID system under normal conditions while providing for the sharing of multiple storage devices during conditions in which a host computer or storage array controller fails.
The system of the present invention is like the conventional active-active system except it incorporates a switch or repeater which isolates the channels of the two or more single RAID subsystems when all the host computers and controllers are functioning properly. If three same speed single RAID subsystems are included, for example, the system functions at 300% the speed of a single RAID subsystem during the vast preponderance of the time when all of the host computers and storage array controllers are functioning properly. In the case of a host computer or storage array controller failure, however, the bidirectional switch or bidirectional repeater closes and establishes electrical connection between the single RAID subsystem with the failure and the single RAID subsystem adjacent to it in the system. In this configuration the system has the speed expected of a conventional active-active system, after a host computer or storage array controller failure, about 100% of the speed of an individual RAID subsystem for the two affected single RAID subsystems. The remaining unaffected single RAID subsystems continue to operate at the unhindered maximum speed.