In disc array systems, for example in JBOD (Just a Bunch Of Disks), RAID (Redundant Array of Independent Disks), or other systems having a plurality of devices, one or more controllers (for example disc controllers) are provided as interfaces between the host system and one or more devices (such as RAID disc devices). A Dual-Active system configuration provides maximum data availability and integrity between the Host system and the disk storage. During normal operation, the availability of two RAID controller communicating to the host provides greater data transfer bandwidth. In the event of a controller failure, the failover processor provides full data availability and integrity. A Fibre Channel is a high-speed I/O interface protocol that can be transferred over two categories of physical layers, copper or fibre optic cable.
When two Fibre Channel disk array controllers are used in a Dual Active system configuration it is important that the Dual Active system be able to continue normal operation even when either one of the two controllers has failed for any reason. Failure one controller may result, for example, from a defective electronic component in the controller, or loss of power to the controller, such as may occur if the controller power supply fails. Typically, a controller has an interface to the host system (either the I/O system host in the event that there are a plurality of I/O systems, or to an overall system host), and an interface to devices. In the discussion that follows, we will consider a host server system and a plurality of disk drives. For such a configuration, two areas are particularly problematic relative to ensuring the system's Fibre Channel Loop (FCL) resiliency during controller failure: (1) the controller's Fibre Channel (FC) Loop connection to the host servers; and (2) and the controller's Fibre Channel (FC) Loop connection to the disk drives.
One possible approach to maintaining FCL resiliency is now described relative to the typical Fibre storage system 30 in FIG. 1. In this multi-hub system 30, the FC Loop resiliency problem may be somewhat solved by managing each of the controller's FC Loops by a separate external FC Hub 34, 35 for each host and/or a separate external hub 50, 51, 52, 53 for each disc channel. The external hub 34 connects host server 31 to first host port (Hport1) 36, 38 associated with controllers 40, 41 and external hub 35 connects host server 32 to second host port (Hport2) 37, 39 of controllers 40, 41. In like manner hubs 50, 51, 52, 53 connect disc ports (Dport1, Dport2, Dport3, Dport4) 42, 43, 44, 45, 46, 47, 48, 49 with disk drive loops 1-4 (Disk Loop1, Disk Loop2, Disk Loop3, Disk Loop4) 54, 55, 56, 57.
This configuration somewhat solves the FCL resiliency problem because conventional FC Hubs, such as hubs 34, 35, 50-53, have typically been designed to connect multiple Loop agents together within a single Loop. In this configuration, each of the Hubs 34, 35, 50-53 should recognize the failure of any of its Loop agents (i.e. Hports 36-39, Dports 42-49, or Disk Loops 54-57, or Host servers 31, 32) based on the loss of meaningful FC signal, then bypass the failed Loop agent while ensuring adequate FC signal strength and quality in order for the Loop to continue normal operations. Such normal operation should be guaranteed even if the FC is implemented with maximum standard FC cable length, copper or optical fibre.
In order to accomplish these requirements the FC Hub ports 36-39 should meet at least the following two criteria. First, each FC Hub port (i.e. hub ports 36-39) should be able to intelligently discriminate between FC K28.5 characters on a FC clock frequency (within certain standard predetermined voltage levels) and random signal noise, to determine the proper operation and coherency of the Loop agent. Second, each the Hub (i.e. Hubs 34-35 and Hubs 50-53)should be able to sink into (that is synchronize with) the Loop's FC signal clock phase and frequency, then re-drive the Loop's FC signal clock with adequate signal strength and quality.
In order to meet these two criteria, FC externals Hubs 34-35, 50-53 must be of high quality, and being of high quality are by implication relatively expensive and bulky given the current state of the art in implementing such high-quality Hubs. Typically, each external hub would be implemented as an external enclosure typically measuring about 2".times.12".times.12" and requires its own AC power cable connection for operation. For example, the INTRA LINK 1000 Hub made by VIXEL of California, USA, could be used for this application. However, even if the expense and bulk of such high-quality external Hubs could be tolerated, the system 30 would still be vulnerable to certain Loop agent failure scenarios. For example, if the agent is transmitting random FC signals but has failed logically, the HUB would keep the failed agent on the FC loop which might eventually bring it down due to the random incoherent transitions and brake the connection between the rest of the loop agents. In addition, if a power loss to any of the FC external Hubs 34-35, 50-53 in the system 30 occurs, the connections of the Hub experiencing the power loss are immediately broken and the multi-hub system 30 will suffer from a single pointed failure. That is a single failure that is able to bring down the system, contrary to the intent of a dual-active system to protect against such single-point failure.
Therefore there is a need for a Fibre Channel Loop topology, structure, and method that provides the desired loop agent failure resiliency or redundancy without the expense of providing a separate external hub for each FC loop.
There is also a need for a Fibre Channel Loop topology, structure, and method that provides the desired loop agent failure resiliency or redundancy without the size and bulk associated with the plurality of external hubs.
There is also a need for a Fibre Channel Loop topology, structure, and method that will not experience a single point failure in the event of a power loss to some system components, that is there remains a need for resiliency and redundancy in the event of power failure.