Not Applicable.
Not Applicable.
Not Applicable.
Not Applicable.
(1) Field of the Invention
This invention relates to systems in which multiple controllers are used to control an array of storage devices.
(2) Description of Related Art Including Information Disclosed Under 37 CFR 1.97 and 37 CFR 1.98
The acronym RAID refers to systems which combine disk drives for the storage of large amounts of data. In RAID systems the data is recorded by dividing each disk into stripes, while the data are interleaved so the combined storage space consists of stripes from each disk. RAID systems fall under 5 different architectures, plus one additional type, RAID-0, which is simply an array of disks and does not offer any fault tolerance. RAID 1-5 systems use various combinations of redundancy, spare disks, and parity analysis to achieve conservation reading and writing of data in the face of one and, in some cases, multiple intermediate or permanent disk failures. Ridge, P. M. The Book QfSCSI: A Guide For Adventurers. Daly City Cal. No Starch Press. 1995 p. 323-329. In this application, a RAID system consisting of one host computer, one controller, and an array of multiple channels, each channel consisting of several direct access storage devices in serial electrical connection, will be termed a xe2x80x9csingle RAID subsystemxe2x80x9d.
Conventional RAID systems guard against failure of a controller by the active-active system. This system consists of two single RAID subsystems, each with a host computer, a controller, and an array of direct access storage units. The direct access storage units, in the most common case, disks, are arranged in channels in which the disks are connected in a series. A common arrangement is for one controller to control six channels of five disks in each channel. In the active-active system, each channel of one system is connected electrically to another channel in another system. This means that, in the event of the failure of one controller, the other controller can serve all 10 disks in each xe2x80x9cdoublexe2x80x9d channel. Unfortunately, during normal operation when both controllers are operating there is interference associated with the fact that two controllers are simultaneously accessing a double channel of ten disks. This interference reduces the speed of a normally acting active-active system to about 130% of the speed of a single RAID subsystem rather than the 200% of a single RAID subsystem expected from the operation of two single RAID subsystems.
U.S. Pat. No. 5,768,623 discloses a system for storing data for several host computers and several storage arrays which are linked so that each storage array can be accessed by any host computer. The system uses single-ported disks and Serial Storage Architecture (SSA) in a SSA disk array loop. Messages and data can travel either clockwise or counter-clockwise when traversing the loop. The bandwidth of such a loop is necessarily lower than that of a fibre channel configuration.
U.S. Pat. No. 5,812,754 discloses a RAID system which uses a fibre channel arbitrated loop to connect host computers and controllers as well as a separate fibre channel arbitrated loop to connect controllers and storage disks. In addition, a port bypass circuit is connected to each component in order to allow bypassing of any failed component so the operation of the loop is not affected by the failed component. Finally, in one embodiment, a star coupled RAID system with orthogonal data striping is described. In this embodiment defective components can be removed physically from the system. This system is considerably more expensive and slower in operation than the system of the present invention.
The RAID systems of the prior art do not provide the advantages of the present invention, that of inexpensively increasing the overall speed of N same-speed single RAID subsystems to N times the speed of a single RAID system under normal conditions while providing for the sharing of multiple storage devices during conditions in which a host computer or storage array controller fails. The present system maintains the high overall speed under normal conditions and provides host computer and controller redundancy without the expense of a switching system connecting the channels of storage devices and while taking advantage of the high speed associated with fibre channel loops and switch fabric configurations.
The system of the present invention is unlike the conventional active-active system because it uses a high bandwidth fibre channel arbitrated loop or switch fabric to connect the host computers and controllers. This provides redundancy in the case of any single computer or controller failure. In addition, since the present invention includes dual-ported storage devices, the failure of a storage device does not have a disruptive effect on the system. Each storage array controller (SAC) is designated a primary SAC for an array of storage units and as a secondary SAC for a different array of storage units. Each array of storage units is assigned to a primary SAC, which normally controls the array, and to a secondary SAC, which assumes the identity of the primary SAC upon failure of the primary SAC. Under normal conditions, each SAC controls only the array of storage units that it serves as primary SAC. Both the primary SAC and the secondary SAC are connected by separate loops to separate ports on the dual-ported storage devices. The combination of one primary SAC, its storage device array, and one secondary SAC which is potentially able to control the storage device array is termed a xe2x80x9cstorage array setxe2x80x9d.
If three same speed single RAID subsystems are included, for example, the system functions at 300% the speed of a single RAID subsystem during the vast preponderance of the time when all of the host computers and SACs are functioning properly. In the case of a storage array controller or associated host computer failure, however, an intact host computer and SAC (the secondary SAC of the defective storage array set) takes over the operation of the failed system""s array of storage devices. The intact secondary SAC assumes the identify or address of the failed controller and retains its own identity and duties to serve its own storage device array as the primary SAC. In this way, the intact system can address its own storage devices as well as those of the failed host computer or controller. In this configuration the system has the speed expected of a conventional active-active system, after a host computer or SAC failure, about 100% of the speed of an individual RAID subsystem for the two affected single RAID subsystems. Any remaining unaffected single RAID subsystems continue to operate at the unhindered maximum speed.
The fibre channel loop and switch fabric configuration are becoming the industry standards for loop or serial interfaces, and SCSI has long been the industry standard for bus or parallel interfaces. The present invention is applicable for either the fibre channel disk array loop or SCSI interfaces for the host computers and SACs. In addition, the present invention is applicable to a switch fabric configuration.
The redundant RAID system of this invention extends the protection of the operation of a RAID system from providing for disk failure to providing for host computer or SAC failure. The invention comprises two or more (N) single RAID subsystems which are linked by a very wide bandwidth fibre channel loop or switch fabric configuration. Each SAC is designated a primary SAC for an array of storage devices to which it is linked by a loop connection to one port on each device. A second port on each device is used to link in a loop to a secondary SAC. The primary SAC normally controls the array of storage devices. In the event of failure of the primary SAC or associated host computer, the failure is detected by the secondary SAC, which then assumes the identity of the primary SAC, learns the identity and location of the affected array of storage devices, and serves this array as though it were the primary SAC.
Thus the system normally functions as (N) independent single RAID subsystems and functions at the speed of one single RAID subsystem multiplied by N if the single RAID subsystems all have the same speed. If the speed of the single RAID subsystems vary, the system normally functions at a speed which is the sum of the single RAID subsystems. In the event of a host computer or primary SAC failure, the secondary SAC controls a double set of storage array devices. This causes interference in transmission of data to the storage devices and slows the speed of the system. The functioning controller thus takes over the function of the disabled controller and provides continuing service, albeit at a reduced speed. The unaffected single RAID subsystems of the redundant RAID system of this invention continue to function unhindered.
In the normal operating mode the present invention enables each SAC to communicate with a set of disks independently of any other SAC, thus operating the redundant RAID system at the speed of N single RAID subsystems. In the event of failure of the host computer or SAC of a component single RAID subsystem, the system automatically assumes the configuration of a conventional active-active system with respect to the affected single RAID subsystem and an unaffected single RAID subsystem. The redundant RAID system continues to operate with access by the functioning RAID subsystem host SAC to all of the disks of both the failed and the functioning SAC, although at a reduced speed.
A host computer and SAC redundant RAID system with a normal speed much higher than the conventional active-active host computer and SAC redundant systems is provided by this invention. In the event of failure of a host computer or SAC the speed of the system is no lower than that of a conventional host computer and storage array controller redundant system. If greater than two single RAID subsystems are included in the redundant RAID system, the speed of the system under nearly all conditions is greater than the conventional redundant system.
The objective of this invention is to provide a host computer and SAC redundant RAID system which continues to operate despite the failure of a single host computer or SAC.
Another objective of this invention is to provide a N host computer and SAC redundant RAID system which operates at the speed of N single RAID subsystems if all have the same speed in the absence of failures, yet provides protection against host computer or SAC failure.
Another objective of this invention is to provide a N host computer and N SAC redundant RAID system which continues to operate at a reduced speed during a host computer or SAC failure while the system continues to operate at the speed of Nxe2x88x921 single RAID systems if all subsystems have the same speed.
Another objective of this invention is to provide a N host computer and SAC redundant RAID system which continues to operate as long as fewer than or equal to N/2 of the single RAID subsystems suffer a failure of the host computer or SAC and each single RAID subsystem with a failed host computer or SAC is linked to an intact secondary SAC.
Another objective is to provide a redundant RAID system with two-ported storage devices each of which is connected to both a primary SAC and to a secondary SAC.
Another objective is to provide a redundant RAID system in which fibre channel or switch fabric technology is used to maximize the speed of the system.
A final objective of this invention is to provide a host computer and SAC redundant RAID subsystem which is inexpensive, resistant to failure, easy to maintain, and is without harmful effects on the environment.