A SAN is used to interconnect nodes within a distributed computer system, such as a cluster. The SAN is a type of network that provides high bandwidth, low latency communication with a very low error rate. SANs often utilize fault-tolerant technology to assure high availability. The performance of a SAN resembles a memory subsystem more than a traditional local area network (LAN).
The preferred embodiments will be described implemented in the ServerNet (ServerNet) architecture, manufactured by the assignee of the present invention, which is a layered transport protocol for a System Area Network (SAN). The ServerNet II protocol layers for an end node and for a routing node are illustrated in FIG. 1. A single NIC and VI session layer may support one or two ports, each with its associated transaction, packet, link-level, MAC (media access) and physical layer. Similarly, routing nodes with a common routing layer may support multiple ports, each with its associated link-level, MAC and physical layer. The link layer protocol provides link management functions, encoding and decoding of data and command, and buffering of received packet data. The ServerNet II link layer protocol is a set of simple protocols, running concurrently to manage the flow of status and packet data between ports on independent nodes. Each port contains a transmitter (TxPort) and a receiver (RxPort) which cooperate to manage the link.
Support for two ports enables ServerNet SAN to be configured in both non-redundant and redundant (fault tolerant, or FT) SAN configurations. On a fault tolerant network, a port of each end node may be connected to each network to provide continued message communication in the event of failure of one of the SANs. In the fault tolerant SAN, nodes may be also ported into a single fabric or single ported end nodes may be grouped into pairs to provide duplex FT controllers. The fabric is the collection of routers, switches, connectors, and cables that connects the nodes in a network.
The SAN includes end nodes and routing nodes connected by physical links. Each node may be an end node which generate and consume data packets. Routing nodes never generate or consume data packets but simply pass the packets along from the source end node to the destination end node.
Each node includes duplex ports connected to the physical link. A link layer protocol (LLP) manages the flow of status and packet data between ports on independent nodes.
The ServerNet SAN has the ability to perform system management from a single point anywhere in the SAN. SAN management performs many functions including collection of error information to isolate faults to the link or module where the faults occurred.
An xe2x80x9cIn Band Controlxe2x80x9d or IBC mechanism supports a low overhead way of performing SAN management functions. The term xe2x80x9cin bandxe2x80x9d indicates that the network management control data travels over the existing SAN linksxe2x80x94with no separate cable or LAN connection. In contrast to data packets, both routing nodes and end nodes generate and consume IBC packets. IBC packets are not routed like data packets, each IBC packet contains embedded source routing information. Each router or end node that receives the IBC packet forwards to the next destination in source route list.
The ServerNet SAN includes a maintenance system having responsibility for system initialization, fault reporting, diagnostics, and environmental control. A pair of service processors (SPs) manage the maintenance system. The SPs functions as ServerNet I/O controllers and communicate with each other only via the ServerNet SAN.
The maintenance system uses dual system-maintenance buses which form redundant trees, independent of normal system functional paths and provide a path of two industry standard interconnects. The maintenance system controls, initializes, tests, and monitors all ASIC operations and provides a means for ASIC initialization, SAN topology determination, and error reporting.
In the SeverNet SAN either data or command symbols are continually being transmitted on a link. IDLE commands are transmitted when there are no packets or other commands to be sent. FILL commands are inserted into a stream of packet data when the flow control protocol indicates that the receive port cannot accept additional packet data.
The LLP manages a BUSY/READY flow control protocol used to communicate all changes in the state of a port""s receiver to the remote node. When the ports receiver state changes to xe2x80x9cinbound busyxe2x80x9d, i.e., the port can not accept more data packets, its transmitter sends a BUSY command. When the receiver state changes to xe2x80x9cinbound readyxe2x80x9d its transmitter sends a READY command.
The LLP also manages a link alive protocol which uses the flow control commands (BUSY/READY) to implement a heartbeat which is monitored by the remote receiver on the link. Periodically, the link-alive protocol triggers transmission of a flow control command that indicates the current state of the local receiver.
The flow control protocol requires that ports transmit a flow control command whenever the state of its receiver changes. The link alive protocol requires that ports repeat the last flow control transmitted when no local receiver state change has occurred for approximately 512 symbol times.
The receiver ports on one end of the link must monitor the applicable link alive commands from the remote port to determine the state of the link. A link is considered xe2x80x9calivexe2x80x9d when it is receiving link alive commands regularly. A link is considered xe2x80x9cdeadxe2x80x9d when a receiver detects no link alive commands within a predetermined time period. The link exception protocol is notified when link state changes from xe2x80x9calivexe2x80x9d to xe2x80x9cdeadxe2x80x9d. Receive ports provide a xe2x80x9clink alivexe2x80x9d status bit indicating whether the link is obeying an applicable link alive protocol. Transitions of the xe2x80x9clink alivexe2x80x9d status bit must be capable of causing an interrupt (either directly at an end node or via the maintenance interface at routing nodes).
The ServerNet SAN has been enhanced to improve performance. The original ServerNet SAN configuration is designated SNet I and the improved configuration is designated SNet II. Among the improvements implemented in the SNet II SAN is a higher transfer rate and different symbol encoding. To attach SNet I end nodes and routing nodes to serial cables a special two-port router ASIC that matches SNet I devices to SNet II devices. This two-port router will be referred to as a xe2x80x9clink extenderxe2x80x9d in this document. The link extender includes a local port coupled to a shorter link and a remote port coupled to a longer link. The remote port includes a big FIFO to compensate for the latency of the longer link. The term xe2x80x9clink extenderxe2x80x9d is used herein only a convenient name and does not connote any limitations on the functioning of the device.
The link extenders normally operate without the intervention of system error handling software. The system error handling software treats a connection including link extenders as if it were a single link.
A typical connection utilizing link-extenders is depicted in FIG. 2. End node A is connected to first link extender x by link1. The first link extender x is coupled by link 2 to a second link extender y. The second link extender y is coupled by link 3 to the End Node B. The link extenders normally operate without intervention of system error handling software.
Link alive status information is not propagated between ports. Thus, loss of link alive on link 1 would not be propagated to end node B so that no interrupt would be generated.
According to one aspect of the invention, a link extender includes link alive propagation logic for propagating the loss and resumption of link alive between the ports of the link extender.
According to another aspect of the invention, the link alive propagation logic monitors a status bit maintained at each port which is set to indicate that a link is alive and reset to indicate that a link is dead. If the status bit in one port is reset the link alive logic asserts a blocking signal to block transmission of link alive commands at the other port.
According to another aspect of the invention, a connection includes a first node, first and second link extenders, a second node, a first link coupling the first node and the first link extender, a second link coupling the link extenders, and a third link coupling the second link extender to the second node. If the first link becomes dead, the second link extender stops transmitting link alive commands on the second link and the second link extender stops transmitting link alive commands on the third link. Thus, link alive status is propagated through the link extenders of the connection and the second node is alerted that the first link is dead.