Ever increasing bandwidth needs of enterprise data centers has led to the development of 10 Gbps Ethernet technology. Commercial 10 Gbps Ethernet Network Interface Cards (NICs) have been available in the market for some time now. TCP/IP is the most commonly used protocol to process data both in enterprise data centers and on the Internet. Recently, a technique, referred to as receive as side coalescing (RSC) or large receive offload (LRO), has been introduced to increase transport control protocol/Internet protocol (TCP/IP) processing. RSC allows NICs to identify packets that belong to the same TCP/IP flow and to coalesce them into a single large packet. As a result, a TCP/IP stack has to process fewer packets reducing per packet processing costs. A NIC can do this coalescing of packets during interrupt moderation time, and hence packet latency is not affected.
Typically, the RSC is implemented within NIC hardware or in a lower level of a network stack that is lower than a TCP/IP stack. As packets are pulled from the driver's receive queue, they are run through the LRO code that parses the packet contents to determine whether the packet can be coalesced or not. At this point, the LRO code has no knowledge of the state maintained by a TCP layer for the connection and the TCP layer has no knowledge that it is actually receiving a large coalesced packet. In addition, typically only those packets that arrived in a burst (e.g., driver implements interrupt coalescing also) and are already present in the driver's receive queue are coalesced into a large frame.
Such a technique performs poorly or has limitations in the certain situations. When the remote peer's throughput is inhibited by the receiver's reduced ACK responses, since a TCP layer sees only a coalesced packet (instead of the actual number of segments sent by the sender), it sends at most one acknowledge (ACK) message. If a Delayed ACK option is enabled, it may send at most one ACK for two large coalesced packets. The sender's congestion window, or ability to transfer more data in a given round trip time depends largely on how frequently it receives the ACKs. If the acknowledgements are slow in arriving, this may inhibit the throughput of the sender and has a counter effect on a single connection's throughput.
Further, consider a TCP connection reaching steady state transferring bulk data, at some point, some element in the network drops a packet of the connection but continues to send further packets in the stream. For every out of order packet received, the receiver sends a Duplicate ACK. When the TCP sender receives three Duplicate ACKs, it retransmits the lost packet immediately without resorting to a retransmit timeout. A retransmit timeout is usually of the order of half a second or more and results in severe reduction in network utilization. So the TCP protocol makes several improvements to loss recovery as part of its Fast Retransmit and Fast Recovery algorithms. With current LRO, when TCP receives one large out of order coalesced packet, it generates only one Duplicate ACK, and the other end is unable to follow the Fast Retransmit and Recovery algorithm. Hence connections with loss and LRO end up with Retransmit Timeouts and a longer recovery period than without LRO.
If the LRO logic is implemented at a low level then separate changes are required to parse different forms of Layer-2 headers. This becomes complicated for some applications where TCP/IP packets may be transmitted over a plethora of media including non-traditional networks such as universal serial bus (USB) and Firewire, some of whose specifications may not be known at the time of implementing the LRO logic.
When a device acts as a bridge or router, it forwards TCP packets from ingress interface to egress interface based on routing tables or other logic. If LRO is blindly done on the receive side, then the large packet once again needs to be broken down into network sized units before sending them on the egress interface. So additional processing is then required to make sure that packets intended to be bridged or routed do not go through the LRO path unless the outgoing interface hardware support TCP segmentation offload. Finally, if the software LRO code is too low in the network stack, only the coalesced packet is passed through firewall rules. There may be cases where firewall rules are to be applied to individual packets.