FIG. 1 illustrates an exemplary Storage Area Network (SAN) 100 comprised of two FC-AL switches S1 and S2, each of which may be a Switch On a Chip (SOC). A number of drives 102 and initiators I1 and I2 are attached to each FC-AL switch S1 and S2 through node ports 104. Each FC-AL switch S1 and S2, when connected to drives and initiators, may form a Switched Bunch Of Disks (SBOD) 120. A processor 108 is also coupled to each switch for executing firmware 122 to perform operations on the switch, among other things.
In addition, cascade ports 106 on the FC-AL switches S1 and S2 enable the switches to be interconnected via two trunks T1 and T2. Trunking involves the use of multiple inter-switch connections for increased throughput and redundancy. In the example of FIG. 1, trunk T1 is designated as the primary trunk, and trunk T2 is designated as the duplicate trunk. Note that the designation of trunks as primary and duplicate does not merely represent a difference in names, but represents substantive differences between the two trunks. In the exemplary configuration of FIG. 1, the initiators I1 and I2 connected to S1 are able to communicate with devices connected to S2 (e.g. drive D1) through the trunks T1 and T2.
However, primary trunk T1 may fail (see 110 in FIG. 1) for a number of reasons. In conventional FC-AL switches, each FC-AL switch (switch S1 and S2 in the example of FIG. 1) responds to the failure independently. When hardware in S1 recognizes the failure, it enters a failover mode and initiates a failover event. Trunk failover and failback is the ability to redirect traffic from a failed trunk to a working trunk and/or back to the original primary trunk once it is re-established. Failover and fail back is managed and driven by an Application Programming Interface (API) 124, which is used to manage each of the switches independently in separate enclosures.
When the failover event is initiated, a bypass event occurs in which the cascade port 106 coupled to the trunk T1 is changed from an “inserted” state (i.e. device connected) to a “bypass” state (i.e. device not connected) according to FC protocols well-understood by those skilled in the art. The hardware in S1 also generates and sends an interrupt to the processor 108 connected to S1, which recognizes that T1 is no longer available but that duplicate trunk T2 exists, and reconfigures the switch to establish T2 as the primary trunk. The processor 108 then initiates a Loop Initialization process to all devices in the SAN 100 to initialize them. When devices receive the Loop Initialization Primitive (LIP) ordered sets at the start of the Loop Initialization cycle, they cease normal communications and enter an initialization state by sending specific ordered sets and frames out over the loop and actively reserving an address, which is necessary to establish subsequent normal communications with other devices on the loop. Normal communications may be resumed only when the Loop Initialization has completed and the participating devices have reserved a new address.
Because each switch is managed by a separate instance of the API, installed in separate enclosures, trunk failures and re-inserts between two enclosures may not be detected simultaneously by the APIs. As a result, failover and fail back may not take place simultaneously; instead, these often occur sequentially, separated by up to approximately 40 ms of time, depending on the rate at which firmware accesses the API. These sequential failover or fail back events cause multiple loop initializations to be triggered which can lead to multiple system issues.
If S1 recognizes the failure before S2, it is possible that the LIP ordered sets generated by S1 may cause all devices attached to S1 to be initialized before switch S2 recognizes the failure and enters a failover mode. However, because S2 has not yet recognized the failure and reconfigured itself to establish T2 as the primary trunk, when S1 tries to send LIP ordered sets down the duplicate trunk T2 (see 112 in FIG. 1), they will be blocked. (Note that duplicate trunks are not allowed to participate in loop initialization, and therefore block LIP primitives and loop initialization frames.) When S2 finally recognizes the failure and enters the failover mode, it reconfigures itself to establish T2 as the primary trunk, and may generate its own LIP ordered sets which causes all devices attached to S2 to be initialized. Furthermore, because S1 had already recognized the failure and had established T2 as the primary trunk, S2 is able to send LIP ordered sets up T2 (see 114 in FIG. 1), which results in the devices connected to S1 to be initialized a second time, a duplication of effort with respect to the devices connected to S1.
In the scenario described above, two independent failover events including multiple LIP ordered sets are generated. During a failover event, no data can be communicated between devices attached to the switches S1 and S2. This time period where no regular communication is possible is approximately the time difference between when the first switch recognized the failure and when the second switch recognized the failure (which is somewhat random and non-deterministic), plus the time it takes for Loop Initialization to complete in both switches, and may represent a potentially severe disruption to data communications.
Note that the possibility of multiple failover events caused by a trunk down event only occurs when the switches are FC-AL switches that act like hubs to connect the devices as though they were in a regular shared arbitrated loop, yet allow multiple switches to be cascaded using trunks. Multiple failover events are a side effect of the capability of the FC-AL switches, and occur because the devices connected to the switches were not designed to handle non-deterministic failover modes.
Therefore, there is a need to be able to synchronize and coordinate trunk failover and failback between the two FC-AL switches when a trunk failure occurs and the primary trunk designation is changed in order to minimize the disruption to data communications caused by multiple unnecessary loop initialization cycles.