As known in the field of computer networking, tail-drop is a traffic management technique implemented by network devices such as routers and switches for handling congestion caused by loss-tolerant (i.e., lossy) traffic. When tail-drop is enabled on a network device for a given traffic class, the network device monitors the depths of egress queues that are associated with the traffic class. If the depth of a particular egress queue exceeds a predefined tail-drop threshold, the network device drops any further packets destined for that egress queue until its queue depth falls back below the threshold.
In contrast to tail-drop, priority-based flow control, or PFC (defined in IEEE standard 802.1Qbb), is a traffic management technique that is implemented by network devices for handling congestion caused by loss-sensitive (i.e., lossless) traffic. When PFC is enabled on a network device for a given ingress port P and traffic class TC 1, the network device monitors the usage of ingress buffers that are associated with TC 1. If TC 1 traffic received on P causes the ingress buffer usage to exceed a predefined PFC threshold (also known as an XOFF value), the network device transmits a PAUSE frame to the traffic sender (i.e., the device connected to P). The PAUSE frame causes the traffic sender to stop sending traffic corresponding to TC 1 for a specified period of time, thereby allowing the ingress buffer congestion on the receiving network device to subside (without having to drop any packets).
In conventional network devices, packet buffer memory is typically shared across ingress buffers and egress queues, for all ports and traffic classes. In such a shared memory model, the memory requirements for tail-drop and PFC are in direct conflict. For example, with tail-drop, it is desirable to allocate a large amount of packet buffer space to the egress queues, which reduces the amount of available memory for the ingress buffers. This allows the network device to absorb traffic bursts on the egress side with minimal packet loss. On the other hand, with PFC, it is generally desirable to allocate a large amount of packet buffer space to the ingress buffers, which reduces the amount of available memory for the egress queues. This ensures that there is sufficient headroom on the ingress side to hit the PFC ingress buffer threshold (and thereby trigger sending of the PAUSE frame), as well as buffer in-flight packets that are transmitted by the sender before it is able to pause transmission.
These conflicting memory requirements mean that conventional network devices cannot properly support tail-drop and PFC for different traffic classes (or the same traffic class on different ports) at the same time. To understand this, consider a scenario where tail-drop is enabled for traffic class TC 0 and PFC is enabled for traffic class TC 1, both on port P. The enablement of PFC for TC 1 should, in theory, guarantee that TC 1 traffic is not dropped (i.e., remain lossless) when congestion occurs. However, assume that the network device has a total shared packet buffer memory of 12 megabytes (MB), and the volume of traffic for TC 0 causes the egress queues associated with TC 0 to consume 10 MB. In this case, if the PFC ingress buffer threshold is set at 3 MB, that threshold will never be hit for TC 1, since there is only 2 MB available for ingress buffers. This, in turn, means that excess traffic for TC 1 will be dropped on the ingress side once ingress buffer usage exceeds 2 MB (because a PAUSE frame is never transmitted to the traffic sender), thereby violating the guarantee that TC 1 traffic remain lossless.