The present application pertains to communications over networks. In particular, the present application pertains to increasing the reliability of packet transmission over InfiniBand® networks, thereby providing greater network bandwidth because less overhead is required for error handling.
InfiniBand® is a trademark of the InfiniBand® Trade Association. InfiniBand® networks typically rely on end nodes, i.e., source and destination nodes, to handle reliability and error issues such as error checking, time outs, and acknowledgments. All the links which connect a source end node to a destination end node are assumed to be operational at an acceptable level of reliability. End nodes may be either sources or destinations and usually operate as both. Links can be comprised of optical fiber, coaxial cable, copper wire, and other media which can experience bit errors on the line caused by noise or static for example. Such errors must be addressed for data transmissions to be accurate and useful. Currently, InfiniBand® uses an end to end protocol for each packet preferably comprising an error checking protocol. If a received packet error is detected, a receiver can request that a sender resend the packet, or a receiver can withhold an acknowledgment until the sender's clock times out and the sender determines to resend the packet. Many other well known policies, protocols, and techniques can be employed for repeating transmission of erroneous packets.
Large networks typically require more switches than small networks, each with multiple ports. The larger the network, the higher the number of links that a packet typically traverses, often referred to as “hops”, from a sending node to a receiving node and the higher the probability that a link bit error will occur. There may be different paths between sender and receiver with each path comprising multiple links. If a problem in a transmission is detected, it can be difficult to determine which link might be having a problem transmitting packets from the perspective of a receiving node or of a sending node. Thousands of nodes may be coupled through the network between a sending node and a receiving node, and the error handling for an erroneous transmission might consume unnecessary bandwidth if it is undertaken from one end to another through the entire network path. For example, a single bit error on one link typically will require retransmission of an entire packet over all the links comprising the network path from sender to receiver. Timeout periods can become prolonged in large networks due to local switch or fabric congestion. If the time out period is too short, it can further increase congestion due to needless resending of packets (that may be in transit, or wherein another copy is in transit). On the other hand, long timeout periods reduce throughput and increase recovery time.
Network data transmission typically comprises a number of procedures to verify the status of packets that are transmitted from senders to receivers. Senders, or sources, of data packets must know whether the data packets have arrived without error at receivers, or destinations. Packets can be broadcast to all receivers capable of receiving packets from a particular sender, they may be multicast to a subset of all potential receivers, or they may be sent point-to-point to one target destination. Embodiments of the present invention are discussed herein with respect to InfiniBand® point-to-point data transmission, however, many aspects of the present invention can be applied to other protocols, types, and formats of data transmission.
Packets that are transmitted over a network typically are stored at the sending device until confirmation of receipt is obtained from the receiving device, because retransmission might be required in the event that the transmitted packet contains a bit error. Acknowledgment of a received packet by the receiver (referred to as an “ACK” returned to a sender) is performed using any of a variety of protocols designed to indicate an error free receipt of individual packets or groups of packets. Packet identification is accomplished via numerical identifiers typically assigned to packets sequentially and preferably stored in a packet's header. A sender purges stored packets that have been acknowledged as error free by a receiving device.
A packet transmission failure can occur in several different ways. For example, a sender might not receive an ACK within some pre-selected time out period. The sender can then resend one or a series of unacknowledged packets. As another example, an ACK packet might have been sent by a receiver but not received by the sender, either because of a network failure or because the ACK is still in transit for example. As another example, a missing sequence number out of multiple received packets can result in the receiver requesting that the missing packet corresponding to the missing sequence number be resent, or that all packets beginning with the missing packet sequence number be resent. In general, an erroneous packet is always eventually discarded at a receiving device because a receiver does not have sufficient information to correct erroneous packet data.