The present invention relates generally to traffic control, and, in particular, to adaptive congestion control.
Data Center Ethernet (DCE) is an emerging industry standard which proposes modifications to existing networks, in an effort to position Ethernet as the preferred convergence fabric or all types of data center traffic. A recent study has found that Ethernet is the convergence fabric, with I/O consolidation in a Data Center as shown in FIG. 1. This consolidation is expected to simplify platform architecture and reduce overall platform costs. More details of proposals for consolidation are described in “Proposal for Traffic Differentiation in Ethernet Networks,” which may be found at http://www.ieee802.org/1/files/public/docs2005/new-wadekar-virtual%20-links-0305.pdf.
Major changes have been proposed for DCE (also referred to as enhanced Ethernet and low latency Ethernet), including the addition of credit based flow control at the link layer, congestion detection and data rate throttling, and the addition of virtual lanes with quality of service differentiation. It is important to note that these functions do not affect Transmission Control Protocol/Internet Protocol (TCP/IP), which exists above the DCE level. It should also be noted that DCE is intended to operate without necessitating the overhead of TCP/IP. This offers a much simpler, low cost approach that does not require offload processing or accelerators.
Implementation of DCE will require a new DC compatible network interface card at the server, storage control unit, and Ethernet switch, most likely capable of 10 Gigabit data rates. There are server related architectural efforts, including low latency Ethernet for high performance servers and encapsulation of various other protocols in a DCE fabric to facilitate migration to a converged DCE network over the next several years. This new architecture for data center networks presents many technical challenges.
Conventional Ethernet networks running under TCP/IP are allowed to drop data packets under certain conditions. These networks are known as “best effort” or lossy networks. Networks using other protocols, such as Asynchronous Transfer Mode (ATM), also use this approach. Such networks rely on dropped packets for detecting congestion. In a network using TCP/IP, the TCP/IP software provides a form of end-to-end flow control for such networks. However, recovery from packet dropping can incur a significant latency penalty. Furthermore, any network resources already used by packets that have been dropped are also wasted. It has been well established that enterprise data center environments require a lossless protocol that don't drop packets unless the packets are corrupted. Also, an enterprise data center environment requires a much faster recovery mechanisms, such as Fiber Channel Protocol, InfiniBand, etc. Lossless networks prevent buffer overflows, offer faster response time to recover corrupted packets, do not suffer from loss-induced throughput limitations and allow burst traffic flow to enter the network without delay, at full bandwidth. It is important to note that these functions do not affect TCP/IP, which is above the DCE level. Some other form of flow control and congestion resolution is needed to address these concerns.
Networks using credit based flow control are subject to congestion “hot spots”. This problem is illustrated in FIGS. 2A-2D. The example illustrated in these figures shows a switch fabric with three layers of cascaded switching (switch layer 1, switch layer 2, and switch layer 3) and their associated traffic flows. While three switch layers are shown for simplicity of illustration, it should be appreciated that a switch fabric may contain many more switch layers.
In FIG. 2A, traffic flows smoothly without congestion. However, as shown in FIG. 2B, if a sufficient fraction of all the input traffic targets the same output port, that output link may saturate, forming a “hot spot” 210. This causes the queues on the switches feeding the link to fill tip. If the traffic pattern persists, available buffer space on the switches may be exhausted. This, in turn, may cause the previous stage of switching to saturate its buffer space, forming additional hot spots 220 and 230 as shown in FIG. 2C. The congestion eventually may back up all the way to the network input nodes, forming hot spots 240-256. This is referred to as congestion spread or tree saturation. One or more saturation trees may develop at the same time and spread through the network very quickly. In a fully formed saturate tree, every packet must cross at least one saturated switch on its way through the network. The network, as a whole, can suffer a catastrophic loss of throughput as a result.
There have been several proposed solutions to this problem. One proposed solution involves detecting potential buffer overflow condition at the switch and broadcasting a message downstream to the destination, then back to the source, requesting that the data rate be throttled back. This approach takes time. Also, it relies on a preset threshold in the switch for detecting when a buffer is nearing saturation. Bursts of traffic may cause the switch to exceed its threshold level quickly and to die down again just as quickly. A single threshold based on traffic volume is unable to compensate fast enough under these conditions.
Many other conventional schemes require some a priori knowledge of where the congestion point is located. These schemes only work well for traffic patterns that are predictable and are not suited for mixed traffic having unpredictable traffic patterns.
Another common workaround involves allocating excess bandwidth or over-provisioning the network to avoid hotspot formation. However, over-provisioning does not scale well as the number of network nodes increases and is an expensive solution as data rates approach 10 Gbit/s. Furthermore, DCE is intended to mix different data traffic patterns (voice, storage, streaming video, ad other enterprise data) onto a single network. This makes it much more likely that DCE will encounter hotspot congestion, since the traffic pattern is less predictable.