Unlike other link layer protocols, such as Fibre Channel or Infiniband, the Ethernet link layer protocol did not originally provide a link layer flow control mechanism. Consequently, if the Ethernet node at one end of the link sent a frame for which the node at the other end of the link did not have a free buffer to receive the frame, the receiving node would simply drop the frame, or packet. Furthermore, when this occurs, there is no mechanism at the link layer for the node to notify the remote node that it dropped the frame. Typically in this case, an upper level protocol detects that it did not receive a frame it was expecting (the dropped frame) within a timeout period and requests retransmission of the frame. The upper layer timeout and retransmission request error recovery solution is undesirable. First, significant latency (e.g., the timeout period) may be introduced. Second, the timeout detection and retransmission may involve the host software in the server leading to reduced CPU utilization for the application workload. Third, even if an offload engine handles the retransmission, power consumption and complexity of the network adapter may be increased.
To address this problem, a flow control solution was devised in which a receiving Ethernet node is enabled to send a frame to the sending node instructing the sending node not to send any more frames for at least a time quanta specified in the frame. This frame is referred to as a PAUSE frame and is defined by the IEEE 802.3x standard. A PAUSE frame may also be sent to restart the flow before the time quantum expires.
However, a problem subsequently emerged with the PAUSE frame solution. Different flows of data over an Ethernet link may specify different classes of service, as defined by the IEEE 802.1p standard. A PAUSE frame stops transmission on the link for all classes of service. This is particularly problematic for data center bridging installations that employ higher-level protocols—such as Fibre Channel over Ethernet (FCoE) and others, such as used in clustered High Performance Computing (HPC) applications—that require lossless behavior. The data center bridging often uses the same Ethernet links to transmit the FCoE (and other lossless-requiring protocol) frames along with frames of other protocols having different classes of service that may not require lossless transmission, such as real-time audio or video data, but which may require high performance that is stifled by the lossless PAUSE frame flow control mechanism.
To solve this problem, an enhancement was added to allow the pausing and time quanta to be specified individually for each of eight different priority classes. The modified PAUSE frame is referred to as a Per Priority Pause (PPP) frame or Priority Flow Control (PFC), and is defined in the IEEE 802.1Qbb standard.
The above Ethernet link layer flow control approaches may be characterized as negative feedback flow control solutions because the receiving node notifies the sending node to stop sending frames in the event of its inability to receive incoming frames. However, it has been observed that the negative feedback flow control scheme may not provide lossless behavior as desired. According to the conventional Ethernet protocol, if the receiving node detects a frame error (e.g., a CRC error), it does not notify the sending node of the error, but instead simply drops the erroneous frame. If the dropped frame is a PPP/PFC frame for a given service class, the sending node will not know that it is supposed to stop sending frames for that service class, which may result in a buffer overflow and dropped frame for the service class, resulting in a failure to provide lossless behavior. Therefore, an improved Ethernet link layer flow control solution is needed.