Network congestion can inhibit the performance of large-scale high-performance computing (HPC) systems and other network systems. Due to the lossless nature of HPC networks, a single point of congestion can spread through the network. When this happens it is called tree saturation. Tree saturation occurs when a full buffer at the input of a switch causes the upstream switch to halt transmission. Packets, the units of data that are routed between a source and a destination, accumulate in the upstream switch which then reaches capacity and causes additional switches to halt transmission. Eventually a tree of congested packets fans out from the original point of congestion to the rest of the network. In a shared system, the congestion caused by one application can impact other applications on the system leading to wide performance variability.
Mechanisms and protocols have been developed to address network congestion. Some adaptive routing algorithms address fabric congestion rather than endpoint congestion. Few HPC networks have hardware mechanisms for dealing with endpoint congestion which requires admission control at the traffic sources. Without mechanisms to manage endpoint congestion, HPC systems rely on software level tuning to reduce the impact of congestion.
One hardware approach to resolve endpoint congestion is the use of congestion notification such as Explicit Congestion Notification protocol (ECN). ECN can signal network congestion and reduce traffic injection rate and has been shown to work well for long-duration network congestion scenarios. However, ECN is a reactive protocol and responds to congestion after it has already occurred. It takes time for ECN to detect and throttle the congestion-causing traffic, leading to slow response times. In addition, ECN is highly sensitive to throttling parameters and a single set of parameters cannot adequately handle all congestion scenarios.
The Speculative Reservation Protocol (SRP) disclosed in U.S. Pat. No. 9,025,456, and hereby fully incorporated by reference, addresses endpoint congestion for large-scale lossless networks. SRP operates on the principle of congestion avoidance, actively combating the formation of endpoint congestion. It uses a lightweight reservation handshake between the traffic source and destination to ensure than no network endpoint is overloaded. To reduce the latency increase associated with the reservation handshake, SRP allows the traffic source to send lossy speculative packets to mask the reservation latency overhead. These speculative packets can be dropped by the otherwise lossless network if they begin to create congestion. SRP has been shown to work well for medium and large message transfers where the size of the payload is large enough to amortize the cost of reservation control measures. However, HPC networks are not always dominated by large message transfers. Network endpoint congestion can be caused by small message traffic or fine-grained communication, which is difficult to address.
A need therefore persists for a protocol which can proactively resolve endpoint congestion caused by small messages where the protocol has fast reaction times and low overhead.