Large computer storage systems comprise arrays of disk and tape drives with several controllers directing the flow of data between the disk drives and the computers. A common network topology is to have drives linked together by two linear unidirectional communication busses running in opposite directions with a controller at each end. This approach allows both controllers to communicate with each drive in the array. In practice, the workload between the controllers is often divided so that one controller services only a subset of the drives, while the other controller services another, possibly overlapping, subset of drives.
The performance of such an arrangement varies, in part, with the topology of the interconnection network. For example, each controller has a direct link to the first drive immediately adjacent on the bus. Since no other controller communicates across this segment of the bus, the full bandwidth of the bus is available between the controller and the first adjacent drive. However, as the controller tries to communicate with drives further away on the network, the controller may come into contention with other controllers trying to communicate with other drives causing an effective reduction in the available bandwidth. Another limitation of this approach is that the network relies on the continued operation of all drives at all times to keep the busses operational. When a drive fails or loses power, the busses are broken at that point, isolating the controllers from drives on the far side of the failed device.
Another network topology replaces the linear unidirectional communication buses with two communication loops. In this topology there is one controller per loop. This approach also allows both controllers to access all of the disk drives in the array, and eliminates contentions between controllers by isolating them on separate loops. This approach has a practical limitation in that each disk drive must have one loop interface for each communication loop to which it is tied. As the number of controllers on dedicated communication loops increases, the cost of each drive increases due to the increase in the number of interfaces it must support. This approach may also be susceptible to failed nodes disrupting the network. If the network technology cannot route the messages through an unpowered or failed node, then the loop is broken at that point, preventing controller access to the disk drives further around the loop.
In a Fibre Channel (FC) network, devices such as controllers and drives are connected by a network or arbitrated loop. Only two of the devices may communicate over a loop or point-to-point network at one time. Other devices must wait to communicate. The system latency of a Fibre Channel arbitrated loop may be reduced by subdividing the network into multiple subloops. Each subloop can operate independently thus allowing for one message transfer operation to occur simultaneously within each of the subloops. Applying this approach to disk arrays, one controller and the disk drives it services most can be assigned to each subloop. When the controller in one subloop places a request to communicate with a drive in another subloop, a hub links the two subloops. Linking the subloops through the hub allows any controller to reach any drive in the array. However, the hub only supports one source-to-destination inter-subloop link at a time. Further, while the inter-subloop link is established, the controllers in the source and destination subloops must arbitrate with each other. As with other loop topologies, a failed or unpowered node in a subloop may disrupt that subloop.
Hubs have been used to eliminate loop topology vulnerability to a break in the loop. The hubs physically connects to each of the nodes in a star type arrangement with connections radiating out from the hub. If a node fails or loses power, the hub circuitry senses the loss of message traffic to and from the node and switches out the failed node. Individual drives and controllers can be switched out due to failure or for maintenance and repair without a major disruption to the rest of the network. Further, new devices can be switched into the loop while the network is operational. Limitations with this approach include high cost and the need to share network bandwidth with all controllers competing for use of the network.