1. Field of the Invention
This invention generally relates to digital communications and, more particularly, to a system and method for controlling the transmit buffering of Ethernet communications.
2. Description of the Related Art
In typical Ethernet transmit implementation, the entire packet or a part of the packet is loaded into an on-chip first-in first-out (FIFO) memory before transmission begins. This is done in order to decouple the latency of fetching the packet from memory, from the bandwidth of the transmission medium. This decoupling becomes critical if the latency of fetching the packet from memory is longer than the transmission latency, or if the latency of fetching the packet from memory is unpredictable, as not doing so can cause the FIFO to underflow in the middle of the packet transmission.
The entire packet needs to be fetched on-chip before transmission if the egress port supports TCP and UDP checksum offload, as the checksum is located in the packet header, but is calculated over the entire payload. Therefore, the hardware must store the entire payload while it is calculating the checksum. Typically, the hardware must store more than one maximum packet in order to pipeline the processing and keep up with the maximum transmission rate. For example, if a port supports 9.6 kilobyte (KB) packets, it may implement a 16 KB or 32 KB FIFO. This is not an issue when a port supports a single traffic class and the maximum packet size supported is small. However, as packet sizes become larger and the number of traffic classes grows, the storage requirements become significant, especially in a device that supports multiple Ethernet interfaces.
There are implementations that share the FIFO between different traffic classes and determine which traffic class to service before the packet is fetched from memory. This approach works well when the Ethernet port does not support Per Priority Pause. However, when Per Priority Pause is enabled, then this implementation can experience head-of-line blocking if the packet that has been fetched into the on-chip FIFO, has not yet commenced transmission on the Ethernet, is associated with a traffic class that has just been paused by a remote node, and there are other traffic classes that have data to send and have not been paused. In this situation, this implementation experiences head-of-line blocking as the first packet will occupy the FIFO and all other traffic classes must wait until that first packet traffic class resumes transmission.
To get around this problem, some implementations use a random access memory (RAM) instead of a FIFO in order to fetch packets out of order, in case the first packet is for a traffic class that has been paused, but subsequent packets are for traffic classes that have not been paused. However, this implementation can require a large amount of buffering if multiple traffic classes are permitted to have the largest size packets stuck in the RAM, while still allowing other, unpaused, traffic classes to transmit. A tradeoff can be made in this implementation between the maximum number of traffic classes that can be paused simultaneously before head-of-line blocking occurs, and the size of the RAM. One end of the spectrum would use a small RAM but experience head-of-line blocking whenever one traffic class is paused, while the other end of the spectrum requires a large RAM and experiences no blocking. If one assumes a 9.6 KB packet size and assumes at least one packet storage per traffic class supported per Ethernet port, this results in 9.6 KB×8=76.8 KB of RAM.
The calculation of the conventional receive buffer size is largely based upon receiver considerations. As noted in the IEEE 802.1Qbb standard, in order to assure that frames are not lost due to lack of receive buffer space, receivers must ensure that a PFC PAUSE frame is sent while there remains sufficient receive buffer to absorb the data that may continue to be received while the system is responding to the PFC PAUSE. The precise calculation of this buffer requirement is highly implementation dependent.
The processing and queuing delays are the time required for a station to detect that it is low on receive buffer, queue the appropriate PFC PAUSE, finish transmitting any frame currently being transmitted, and then transmit the PFC PAUSE. In general, the time to detect the need to transmit the PFC PAUSE and queue it is negligible. However, this may occur just as the transmitter is beginning to transmit a maximum length frame. Assuming a maximum length frame of 2000 octets, and a PFC PAUSE frame of 64 octets, the total worst case delay would be 16,512 bit times. This value would need to be increased appropriately if larger frame sizes are supported or if additional processing time is required within the implementation.
Next, the propagation delay across the media must be considered. The propagation delay across twisted pair is approximately 0.66×C where C is the speed of light (3×108 m/s). Thus, for 10G 802.3 links, the propagation delay works out to 1010/0.66 C bit times/m. Assuming a fiber length of 100 m, 5051 bit times results.
The response time for 10GBASE-T is 14,336 bit times plus the PHY delay of 25,600 bit times (see clause 55.11 of IEEE Std 802.3an™-2006) for a total of 39,936 bit times. In addition, it is possible that a maximum length frame has just begun transmission, thus adding 16,000 bit times for a total of 55,936 bit times. Finally, the return propagation delay (which accounts for data that is already in transit when the PFC PAUSE takes affect), accounts for an additional 5051 bit times. This results in a grand total of 82,550 bits (approximately 10.1 KB) of buffering required for a 100 m 10 Gb/s link. As stated previously, more or less buffering may be required to account for implementation specific characteristics such as larger frame sizes, variances in the processing time of generating the PFC PAUSE frame, granularity of buffer allocation and possible sharing of buffers, among others factors. However, in general, the buffer requirements are approximately 2×(media delay+maximum frame length)+length of PFC PAUSE+the responding end response time.
It would be advantageous if the above-mentioned buffer explosion problem could be solved without any head-of-line blocking, retaining the conventional buffer requirement needed for a port with a single traffic class, while still supporting the maximum number of traffic classes possible on the Ethernet port.