The transmission control protocol/internet protocol (TCP/IP) is a protocol that has become widely used for communications. However, receiving, buffering, processing and storing the data communicated in TCP segments can consume a substantial amount of host processing power and memory bandwidth at the receiver. In a typical system, reception includes processing in multiple communications layers before the data is finally copied to its final destination in an Application buffer. A typical network interface card (NIC) processes the Layer 2 headers (e.g., ethernet headers) and then copies the remaining headers (e.g., Layer 3 and higher headers) and/or the Upper Layer Protocol (ULP) payload to a transport buffer (e.g., a TCP buffer) for networking and transport layer processing. The transport and networking processing (e.g., TCP/IP where TCP is the transport layer protocol) removes the Layer 3 and Layer 4 headers and copies the remaining headers and ULP payload to another buffer. This process repeats in the next level until the last header is removed and the ULP payload is copied to the buffer assigned by the application. Most of the bytes in the frames are payload (e.g., data), but it is copied again and again as the control portion of the frames (e.g., the headers) is processed in a layered fashion. The host CPU, which incurs high overhead of processing and copying including, for example, handling many interrupts and context switching, does this. Thus, very few cycles are available for application processing, which is the desired use of a server machine. For high-speed networking (e.g., 10 Gigabits per second), the additional copying strains the memory sub-system of the computer. For an average of three data copies, the memory subsystem of most commercially available server computers becomes a bottleneck, thereby preventing the system from supporting 10 Gigabit network traffic. Since TCP/IP is the dominant transport protocol used by most applications today, it would therefore be useful to ease the burden of this processing to achieve, for example, scalable low CPU utilization when communicating with a peer machine.
What is needed to reduce the overhead is to ensure data is copied once from the wire to the application buffer. A problem is that the NIC has no idea what portion of a received frame is, for example, ULP data and what portion is ULP control. What is needed is to have the sender build the frames in a way that makes it easy for the receiver NIC to make this distinction. However, each ULP protocol may have its own way of mixing data and control, thereby making it very difficult to build a NIC that supports them all.
Another problem is that TCP offers a byte stream service to the ULP. It is not always possible to tell the beginning of a ULP message (e.g., the protocol data unit (PDU)) inside that endless stream of bytes (e.g., the TCP data). Assuming that the frames arrive without resegmentation at the receiver (e.g., a server), the receiver may unpack the frame using TCP and might be able to locate the ULP header. The ULP header may include, for example, control information that may identify a location in the application buffer where the ULPDU may be directly placed. However, even if a sender could somehow be adapted to employ, in every TCP segment, a TCP layer adapted to place ULP control information starting in the first payload byte of the TCP segment, it might not be enough. This is because resegmentation is not uncommon in TCP/IP communications. There is no guarantee the TCP segments will arrive on the other end of the wire, the way the sender has built them because, for example, there may be network architectural structures between the sender and the receiver. For example, an intermediate box or middle box (e.g., a firewall) may terminate the TCP connection with the sender and, without the sender or the receiver being aware, may initiate another TCP connection with the receiver. The intermediate box may resegment the incoming frames (e.g., use a smaller TCP payload). Thus, a single frame may enter the intermediate box, but a plurality of smaller frames, each with its own TCP header may exit the intermediate box. This behavior by the middle box may disrupt the nicely placed control and data portions.
In the case of resegmentation, the receiver may face a number of challenges. For example, the receiver may not be aware that there are any intermediate boxes between the sender and the receiver. In addition, the initial segmenting scheme used by the sender may not be the segmenting scheme received by the receiver. Thus, although the receiver may be able to order the smaller frames, the receive may be unable to locate, for example, the ULP header and the ULPDU. Accordingly, the receiver may not be able to ascertain the control and boundary information that may be necessary to correctly place the ULPDU in the proper location of, for example, the application buffer of the receiver.
Another problem is that TCP/IP networks may deliver segments out of order. The ULP may have a PDU larger than one TCP segment, which may be limited to 1460 bytes when used on top of the ethernet, and the ULPDU may be split among a plurality of TCP segments. Therefore, some TCP segments may contain, for example, only data and no control information that may instruct the receiving NIC as to where to place the data. The receiver is faced with a choice of dropping the out-of-order segments and requesting a retransmission, which is costly in terms of delay and performance loss, or buffering the out-of-order segments until all the missing segments have been received. Some implementations may choose to accumulate all the out-of-order segments, to wait for the missing TCP segments to be received and then to place them in order. The receiving NIC may then process the whole set of TCP segments, as it uses the control portion to obtain data placement information. This process adds the cost for the temporary buffer and uses high power CPU and wider data path than otherwise. The receiving NIC processes all the accumulated TCP segments in parallel to process other TCP segments at wire speed since traffic on the link continues all the time. The out-of-order segments may create a “processing bubble” for the receiver.
A proposed solution for locating the ULP header is to use the TCP ULP framing (TUF) protocol. According to the TUF protocol, a sender places a special value (i.e., a key) within the TCP segment as the first byte following the TCP header as illustrated in FIG. 1. The key may be a unique value (e.g., a particular 48-bit value) for which the receiver may search. Accordingly, when the receiver finds the key, the receiver has also found, for example, the ULP header or the beginning of the control information (e.g., the first byte of the DDP/RDMA header). However, the TUF protocol has a probabilistic nature. For example, the unique value may occur by accident within the ULPDU. Furthermore, in the face of, for example, resegmentation or TCP retransmission (e.g., from an improper TCP sender) the receiver may misidentify the beginning of the control information, resulting in the silent corruption of the data due to placement in the wrong host memory location. Although the unique value can be increased in length to reduce such a misidentification event, the probability always exists. The key may also present a security risk if an unauthorized receiver is able to obtain the unique value allowing the unauthorized receiver to access the ULP payload.
Another solution to locating a particular header is to use a fixed interval markers (FIM) protocol. The FIM protocol uses only forward-pointing markers and has been limited to internet small computer system interface (iSCSI) applications. In the FIM protocol, a forward-pointing marker is placed in a known location inside the TCP byte stream. This enables the receiver to possibly locate it in the endless TCP byte stream. The FIM marker points forward to the beginning of the iSCSI header as shown in FIG. 2. The marker is placed, by default, every 8192 bytes, although this is negotiable. However, the FIM protocol may have a disadvantage, because the marker is placed only sparingly, every 8192 bytes. Accordingly, a lot of frames may need to be buffered before or if the iSCSI header is to be identified. Other iSCSI headers may have no FIM marker pointing to them such that the receiver has to process the TCP segments in order to be able to place the iSCSI data. The FIM protocol also does not provide a guarantee that the iSCSI header is located following the TCP header or that the iSCSI header is even placed in its entirety in one TCP segment. To use the FIM protocol, the receiver has to store locally the TCP sequence location pointed to by that FIM. It uses this when the TCP segment with that location is received (i.e., additional state information for every FIM received is stored until the corresponding TCP segment with the iSCSI header is received). The FIM protocol does not provide any suggestion or teaching as to the processing of out-of-order TCP segments if the received out-of-order TCP segments are less than the FIM distance (e.g., 8192 bytes in the default). The FIM protocol is also limited to iSCSI applications and may not provide a generic solution for the framing problem that may be needed by all applications using high speed TCP/IP protocol.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of ordinary skill in the art through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.