When a network enabled device transmits a relatively large chunk of data (hereinafter referred to as a segment), e.g. a JPEG picture or any other kind of data, over a network, the segment is typically split into several data packets by a network enabled device before it is transmitted on the physical medium. As one example, the maximum payload size in an Ethernet packet is 1500 bytes. Additionally, protocol headers, such as TCP and IP headers, must be generated for each data packet and for some protocol stacks (such as TCP/IP) one or several checksums must also be calculated for each data packet where the one or more checksums are stored in a header of the data packet.
Normally, a chunk of data or segment is larger than the size of the payload (in principle the segment size may e.g. be as little as a single byte). Often it is several times the size of the payload.
The process of generating these packets including headers and checksums is traditionally done in software. This may use a substantial amount of CPU performance depending on the CPU performance and network speed of the system.
Various solutions addressing this have been proposed by implementing features in hardware. Such processes are also often referred to as (hardware) network offloading.
There are several available hardware solutions to perform network offloading. One traditional method of doing network offloading in hardware can be summarized as:                Read one MTU (Maximum Transmission Unit, i.e. the maximum payload size of one packet on the used network medium) of data from a memory.        Calculate a checksum over the payload while reading from memory.        Generate headers.        Transmit a packet comprising the headers and payload.        While transmitting, read one additional MTU of data from the memory.        Repeat until all packets in the segment, i.e. the chunk of data, have been transmitted.        
A drawback of this solution is that it may send many packets at a very fast rate, which may cause problems for relatively slower connected receiving clients. As an example, consider a device connected to a relatively fast network (e.g. a 1 gigabit/second network) that wants to transmit a large segment to a client connected to a relatively slower network (e.g. a 10 megabit/second network). In this case, routers in the transmission path between the device and the client have to buffer the data until the client has read it all. This may result in buffer exhaustion in the routers. Furthermore, as the routers have a limited buffer size and may also need to buffer data for other transmissions paths, this may lead to data packets being dropped and thereby requiring retransmission. The probability of a packet drop increases with larger segment sizes.
One previously known solution to this problem, typically implemented in software, involves sending only a limited number of packets to each destination and wait for acknowledgement (ACK) before sending additional data packets. However, this increases the CPU usage and may also delay the overall time needed for transmitting the data due to the waiting.
For example, US 2006/034176 describes a hardware implementation using ACK's. A drawback of this is that only protocols such as TCP/IP, which relies on the use of ACK's, can be supported.
Thus there is a need for addressing the problems of router buffer exhaustion and data packets being dropped.
This has been addressed at least to some extent by interleaving packets to different destinations but where packets are interleaved by the same or a similar rate.
U.S. Pat. No. 7,174,393 discloses a communication-processing device (CPD) for data communication that provides a fast-path that avoids protocol processing for most large multi-packet messages and slow-path messaging. A network processor chooses between processing messages along the slow-path that includes a protocol stack of a host or along the fast-path that bypasses the protocol stack of the host.
U.S. Pat. No. 7,167,926 discloses a device working with a host computer for data communication providing a fast-path that avoids protocol processing for most messages, accelerating data transfer and offloading time-intensive processing tasks from the host CPU. The host has a processing capability for validating selected message for either fast-path or slow-path processing.
U.S. Pat. No. 6,996,070 discloses a TCP Offload Engine (TOE) device including a state machine that performs TCP/IP protocol processing operations in parallel. In three different aspects it; stores TCP variables and header values, updates of multiple TCP state variables, and set up a DMA move.