1. Field of the Invention
This invention relates to networking technology, and more particularly to apparatus and methods for managing fault conditions in data connections.
2. Background of the Invention
The Open Systems Interconnection model (OSI model) provides a widely used model for sub-dividing a communications system into layers. A layer is a collection of conceptually similar functions that provide services to the layer above it and receives services from the layer below it. The first layer is the physical layer that translates binary data to and from electrical and optical signals. The second layer is the data link layer, which provides the functional and procedural means to transfer data between network entities and to detect and possibly correct errors that may occur in the physical layer. The third layer is the network layer, which provides the functional and procedural means of transferring variable length data sequences from a source to a destination via one or more networks. The fourth layer is the transport layer, which manages transparent transfer of data between communicating entities.
Communication between a host and storage devices in a high performance storage network is often accomplished using the Fibre Channel architecture. Fibre Channel is a networking technology primarily used to couple storage devices to a computer system, such as a server or mainframe, using fiber optic cables, though it may be used for other applications and with other cable types. The Fibre Channel architecture defines layers FC-0 through FC-2, which correspond to the physical, data link, and network layers. The Fibre Channel architecture additionally defines a common services layer (FC-3) and an application layer (FC-4) that interfaces with transport layer protocols such as SCSI, IP, and FICON.
The FICON (Fibre-Connectivity) protocol, which has been adopted as the ANSI FC-SB-4protocol, is used to manage transport of data over a Fibre Channel cabling infrastructure. In the FICON protocol, communication occurs between entities referred to as a channel and a control unit coupled by means of a logical path. Multiple logical paths may be associated with a single port and/or physical channel connecting the channel and control unit. Each logical path may additionally have associated therewith, and communicating thereover, several devices, such as hard drives, tape drives, RAID arrays, or the like. The channel initiates input and output operations over the logical path by transmitting instructions to the control unit.
In many applications, the control unit and channel are coupled to one another across a network fabric including many network devices such as switches, routers, hubs, and the like. Many different paths may therefore exist between the control unit and channel. The network devices may have internal logic that determines the routing of data through the fabric and the order in which data is transmitted.
The lower protocol layers, e.g., the physical, data link, and network layers, and the switches and other network devices may control the flow of data across the fabric based on an “exchange” to which each unit of data belongs. An exchange includes logically associated sequences of data transmitted in both directions between the channel and control unit and is analogous to a conversation between the channel and control unit. The FICON architecture uses two Fibre Channel exchanges to establish a connection between a channel and control unit—one for communications initiated by the channel and another for communications initiated by the control unit. Instructions from the channel to a control unit may be sent on one exchange, whereas the control unit may respond over the second exchange upon executing the instruction. Inasmuch as two exchanges are present, different messages and instructions may be routed along different paths within the fabric.
Difficulty and delays arise when a fault condition is detected. Upon detecting the fault condition, the control unit will respond to all instructions from the channel by sending a fault indicator until the fault condition is cleared. Due to delays in data propagation, the channel may send several instructions before receiving the first fault indicator. Upon clearing the fault condition, the control unit sends a fault clear indicator to the channel. Upon receiving the fault indicator, the channel will refrain from sending further instructions on the logical path until a fault clear indicator is received.
Due to variation in the propagation delay of communications across the fabric, it is possible for the fault clear indicator to be received by the channel prior to one or more previously-sent fault indicators. In such instances, the channel may receive the fault clear indicator, and shortly thereafter receive a fault indicator corresponding to the same fault condition that the fault clear indicator is clearing. The channel will therefore again pause transmission of instructions until another fault clear indicator is received. Inasmuch, as the fault condition has been cleared, the control unit will not send another fault clear indicator. Communication between the channel and control unit therefore ceases on the logical path.
In view of the foregoing, what is needed is a method and apparatus for effectively communicating clearing of a fault condition despite potential variation in the propagation time for fault indicators and fault clear indicators. Such a method and apparatus should advantageously do so without requiring modification of the operation of lower protocol layers.