Current high-performance applications inject increasingly unpredictable bursty traffic into data center networks, causing network congestion and degrading their own and other applications' performance. Congestion control protocols have been developed to alleviate these problems. These protocols inform traffic sources about the congestion in the network. Using this information, the traffic sources reduce the injection rate of their traffic. When congestion is not indicated, the traffic sources continually attempt to increase their traffic injection rates. The performance of the congestion control mechanism depends on several factors, such as notification delay, accuracy of notification, and the trigger of congestion.
Congestion control protocols for large-scale data centers are based mainly on forward explicit congestion notification (FECN), meaning that the congestion notification is propagated first from the detection point to the destination and is then reflected back from the destination to the traffic source. Typically, congested switches send notifications to the destinations of packets that they forward by setting a specific FECN bit in the packet headers. Direct BECN-based feedback (backward explicit congestion notification), meaning that the congestion notification is returned directly from the congested switch to the traffic source, is currently used generally only in smaller, Layer-2 networks.
When the network interface controller (NIC) at the destination of a given flow receives a packet with the FECN bit set, the NIC is expected to notify the source of the packet about the congestion2. The NIC typically sends this notification by returning a packet to the source of the flow with a BECN bit set. In InfiniBand® networks, for example, the NIC may either send an acknowledgement packet (ACK) with the BECN bit set, when communicating with the packet source over a reliable connection, or it may send a dedicated congestion notification packet (CNP).
Internet Protocol (IP) networks, on the other hand, commonly use the Transmission Control Protocol (TCP) as their transport-layer protocol. The congestion control features of TCP are set forth by Allman et al., in “TCP Congestion Control,” Request for Comments (RFC) 5681 of the Internet Engineering Task Force (IETF), published in 2009, which is incorporated herein by reference. This document specifies four TCP congestion control algorithms: slow start, congestion avoidance, fast retransmit and fast recovery. The slow start and congestion avoidance algorithms are used by TCP senders to control the amount of outstanding data being injected into the network. To implement these algorithms, two variables are added to the TCP per-connection state: The congestion window (cwnd) is a sender-side limit on the amount of data the sender can transmit into the network before receiving an acknowledgment (ACK), while the receiver's advertised window (rwnd) is a receiver-side limit on the amount of outstanding data. The minimum of cwnd and rwnd governs data transmission. Upon encountering an indication of congestion, the receiver instructs the sender to reduce the window size, and the sender reduces the transmission rate accordingly.