A System Area Network (SAN) is used to interconnect nodes within a distributed computer system, such as a cluster. The SAN is a type of network that provides high bandwidth, low latency, communication with a very low error rate. SANs often utilize fault-tolerant technology to assure high availability. The performance of a SAN resembles a memory subsystem more than a traditional local area network (LAN).
The preferred embodiments will be described implemented in the ServerNet(trademark) architecture, manufactured by the assignee of the present invention, which is a layered transport protocol for a System Area Network (SAN). A single session layer may support one or two ports, each with its associated transaction, packet, link-level, MAC (media access) and physical layer. The layer designated the xe2x80x9csession layerxe2x80x9d in the ServerNet(trademark) description corresponds to the transaction layer described in other layered network protocols. Similarly, routing nodes with a common routing layer may support multiple ports, each with its associated link-level, MAC and physical layer.
Each node includes duplex ports connected to the physical link. A link layer protocol (LLP) manages the flow of status and packet data between ports on independent nodes. The ServerNet(trademark) II link layer protocol is a set of protocols, running concurrently to manage the flow of status and packet data between ports. Two types of symbols are used on a link, data symbols and command symbols. Data symbols are used to transport packet data. Commands are used to implement link management and control functions.
Each ServerNet(trademark) port continuously transmits signals so that the port""s status can always be checked. IDLE command signals are transmitted between packets. The ServerNet(trademark) protocol requires that packets be transmitted as a continuous stream of data symbols or FILL command symbols. Thus, if transmit data is unavailable (data under-run) a packet is extended by transmitting FILL symbols until additional data becomes available. Data under-run can result due to transmission from an end-node with low bandwidth or high memory latency. Such end-nodes may not be capable of sustaining a ServerNet(trademark) data stream without buffering.
The extension of a packet by FILL symbols can result in fabric congestion as depicted in FIG. 1. In FIG. 1 the packet traveling from node #0 to node #14 has been extended by FILL commands due to data under-run at its source node(#0). Thus, the packet traveling from node #5 to node #18 is blocked by the extended packet.
There are two common buffer design approaches to solve the dual problems of transmitter under-run and reducing packet latency. The first is to fully buffer the transmit data to ensure that an under-run condition never occurs. However, by storing all the transmit data before forwarding the data packet transmission latency is maximized.
The second approach, using a FIFO buffer to transmit data, is commonly used in local area networks (LANs) wide area networks (WANs). These networks have the option of extending or aborting a packet if under-run occurs. However, if the system extends the packet during data under-run then blocking of packets carrying data can occur as described above with reference to FIG. 1.
Accordingly, neither of the standard approaches presents an optimum solution for a high-performance SAN.
According to one aspect of the invention, a speculative transmit function is implemented utilizing a configurable logical buffer. At the start of packet transmission the logical buffer is configured as a FIFO so that data transmission begins immediately and latency is reduced by not delaying transmission until all data to be transmitted is fully buffered. However, if a data under-run occurs packet extension is allowed only for a fixed time period after which transmission of the packet data is abandoned.
According to another aspect of the invention, transmission of the data packet is abandoned immediately when data under-run occurs.
According to another aspect of the invention, abandonment of transmission is indicated by terminating a packet with a special symbol indicating that the packet is not to be processed or reported in error by intermediate routing nodes or its destination.
According to another aspect of the invention, subsequent to abandonment of packet transmission the logical buffer is reconfigured as a STORE-AND-FORWARD buffer and all transmission data is buffered prior to restarting transmission of the packet data.
According to another aspect of the invention, if subsequent to the occurrence of the data under-run, data becomes available prior to expiration of the fixed period then transmission of the packet data is not abandoned and continues.