1. Field of the Invention
This invention relates generally to the field of computing technology and more particularly concerns the reducing of congestion in Infiniband-based data transmission systems.
2. Description of the Related Art
Typically, in the computing industry, data may be transferred over several different types of networks such as the Internet, Local Area Networks (LAN), Wide Area Networks (WAN), Storage Area Networks (SAN), etc. Typically, data transferred over these types of networks may involve utilization of data transfer protocols such as, for example, transmission control protocol (TCP) and an internet protocol (IP).
Through use of the TCP, data that is sent over a network is broken up into little pieces for transmission and reassembled once the data reaches its destination. Data may be sent in the form such as, for example, data packets, etc. Depending on the interface used, the TCP may break down data into a variety of data packet sizes such as 128 byte packets. The TCP includes its own information which allows the data to be reattached in the correct order as well as resending any data that happens to get “dropped” (data that is lost due to various reasons such as congestion over the network). IP routes the data packaged by the TCP to a destination such as a device within a network.
As originally designed, the TCP protocol was intended to be a very fault tolerant protocol that could withstand catastrophic failures of the communication network. TCP was also designed with long range communication and messaging in mind. As a result, TCP is inherently a protocol that has high overhead for handling the communication of variable length segments. New transport media and methods are now available which avoids the need for complicated packet loss tolerant protocols such as TCP especially when data transfer is utilized over short range communication. Infiniband is one of the new data transport architectures that utilizes link level control to efficiently send data in a coherent manner. Although Infiniband is a promising data transport architecture, congestion control for Infiniband is ineffective in many situations. Therefore, a method is needed to improve congestion control in Infiniband.
The Infiniband architecture is based on usage of a computer interconnect fabric that utilizes a unified fabric and provides a mechanism to share I/O interconnects among many servers. The Infiniband architecture typically creates a more efficient way to connect storage and communications networks and server clusters together, while delivering an efficient I/O infrastructure. The Infiniband architecture is based on channel I/O. Infiniband channels are created by attaching host adapters and target adapters through Infiniband switches. This interconnect infrastructure is called a “fabric,” based on the way input and output connections are constructed between target adapters and sending adapters. All Infiniband connections are created with Infiniband links, starting at data rates of 2.5 Gbps and utilizing both copper wire and fiber optics for transmission.
Infiniband features link level flow control that in certain few circumstances can reduce congestion caused packet loss and decrease the need for complicated, packet loss tolerant protocols such as TCP. Unfortunately, in many circumstances, link level flow control cannot prevent congestion due to a condition known as congestion spreading. Congestion spreading occurs when backups on overloaded links or nodes curtail traffic in other, otherwise unaffected paths. This curtailing of unaffected paths by overloaded links is typically known as head of line blocking.
FIG. 1A shows a simplified Infiniband data transmission system 10. System 10 includes an Infiniband sender 12 that transmits data to a switch 14. The data is then sent to a receiver 16. In such a simplified system, head of line blocking and other types of data transfer congestion rarely exists because only one link exists. Unfortunately, in real life network architectures, such a simplified system is not typically utilized.
Typically, when an output port of a switch runs out of buffers, the link level flow control will then apply back pressure to the input port of the switch to shut off further traffic. If the output port of a switch is full, then the input ports with packets for the affected output port will also shut down, possibly stranding packets destined for other output ports. This effect is known as “head of line” blocking, and is the mechanism through which congestion spreads to otherwise non-congested paths. Data transfer congestion can occur in numerous other types of switch configurations as well.
FIG. 1B shows another example of a congested Infiniband system 20. At the instance in time shown, much of the traffic is destined to host-0 28 which is connected to an output port-16 34 of a switch 26. Because more packets are headed to the switch 26 than the host link can carry, the port-16 buffers fill up and exert back pressure 40 on all the input ports carrying host-0 28 bound data from a storage unit-0 22. In particular, when a packet headed for host-0 28 reaches the head-of-queue position in a buffer of an input port-0 30, it is held because no room is available in the buffers of output port-16 34. This blocks other traffic in input port-O's queue which might be headed for other hosts such as, for example, hosts 31 and 32, whose links are not congested. Hence, the packet rate for host-1 31 and the host-2 32 is also reduced, even though they are not part of the original congestion. Consequently, in prior art systems, head of line blocking and/or input buffer congestion results in data transport congestion thereby dramatically reducing data transmission efficiency of the system.
In view of the foregoing, what is needed is a new and improved methodology for reducing congestion during data transfer and storage which utilizes Infiniband data transport architecture. Such an approach would take advantage of the full data transfer capabilities in the transmission media of the Infiniband architecture, and significantly reduce head of line blocking to optimize data transfer throughput.