The present invention relates generally to methods and devices for network communications, and specifically to streamlining operation of reliable communication transport protocols.
The Transmission Control Protocol/Internet Protocol (TCP/IP) suite is a widely-used transport protocol in digital packet networks. The TCP is described by Postel in RFC 793 of the U.S. Defense Advanced Research Projects Agency (DARPA), entitled “Transmission Control Protocol: DARPA Internet Program Protocol Specification” (1981), which is incorporated herein by reference. TCP is a connection-oriented, end-to-end, full-duplex protocol, which provides for reliable inter-process communication between pairs of processes in host computers. The information exchanged between TCP peers is packed into datagrams termed segments, each segment comprising a TCP header followed by payload data. The segments are transported over the network in IP packets.
FIG. 1 is a schematic block diagram depicting a structure of a Transmission Control Protocol (TCP) segment header 10, as is known in the art and specified in RFC 793. Header 10 begins with a source port 10 and a destination port 14, which are 16-bit identifiers respectively indicating the origin and intended destination of the TCP segment. As noted, the TCP is a connection-oriented protocol, signifying that messages are exchanged between two identified end-points, between which a connection has been established. Since the TCP supports multiplexing, i.e., many processes within a single host computer may communicate independently, port numbers are assigned to each process to identify its interface to the TCP. Port numbers are unique within a host computer, however, there is no guarantee of uniqueness across different computers. In order to produce an identifier which is unique throughout all networks, a port identifier is combined with an internet address, generating an identifier termed a socket.
A logical communication channel established between pairs of sockets is termed a connection. Connections are established after a three-way handshake process has completed successfully. An important element in the reliability of the TCP is the use of sequence numbers. Each octet (8 bits) of data transmitted is assigned a sequence number by the sending process. The receiving process is required to acknowledge receipt of of all octets received, by sending an acknowledgment (ACK) verifying the last sequence number successfully received. Sequence numbers provide a way for the sender to identify missing data requiring re-transmission, as well as a way for the receiver to order data which arrives out of sequence. Thus, a sequence number 16 contains a sequence number for a first octet of data in the TCP segment payload. If an ACK flag 24 is set, an acknowledgment number 18 contains the value of the next sequence number the sender of the acknowledgment segment is expecting to receive.
The TCP header contains six flags indicative of additional control information. A RST flag 28 indicates a request to reset the connection. A SYN flag 30 indicates that the segment is part of the three-way handshake process. A PSH flag 26 directs the receiving process to make transmitted data available immediately to the application level without waiting for a timeout or full buffer. A FIN flag 30 indicates that the sender has no more data to send.
An options field 40 provides a way to extend the original protocol while preserving compatibility with earlier implementations. The options field is used to synchronize various parameters during connection establishment, e.g., window scale and maximum segment size. In addition, the options field can convey information which is useful on an established connection, for example a Selective Acknowledgment (SACK) option and a timestamp (TS) option. The SACK option is described by Mathis, et al. in RFC 2018 of the Network Working Group, entitled “TCP Selective Acknowledgment Options” (1996), which is incorporated herein by reference. The SACK option supplements acknowledgment number 18 by providing a way to recover quickly from a single or consecutive set of missing segments by using an additional Acknowledgment number indicating segments received after the missing segments.
The TS option is described by Jacobson, et al. in RFC 1323 of the Network Working Group, entitled “TCP Extensions for High Performance” (1992), which is incorporated herein by reference. The TS option supplies a way to measure round-trip delivery times for segments, i.e., the time between the transmission of a segment and the receipt of an acknowledgment for the segment. This facility allows a TCP implementation to adapt acknowledgment timers to dynamic network behavior.
For the past twenty years, TCP/IP has been implemented as a software suite, typically as a part of computer operating systems. Within the TCP/IP software suite, the TCP receiver function is the largest logical task. A number of authors have suggested strategies for enhancing the performance of TCP receiver processing. For example, Van Jacobson proposed a header prediction algorithm in 1990. The algorithm is described in TCP/IP Illustrated, Volume 2: The Implementation, by Wright and Stevens, section 28.4, pp. 936ff, published by Addison-Wesley, 1995, which is incorporated herein by reference. The header prediction algorithm posits that the majority of incoming TCP segments fall into a single category: segments correctly received, in proper order. For this category of segments, a large part of the TCP receiver logic may be bypassed, thereby greatly streamlining the process. However, notwithstanding this and numerous other improvements, software implementations of TCP receiver logic are limited by operating system performance constraints, as well as inefficiencies deriving from the serial nature of program execution in general-purpose microprocessors and associated overhead.
As long as network speed was the main factor limiting receiver rates, software implementations of TCP receiver logic provided adequate performance levels. However, with the advent of network speeds in the Gbps and 10 Gbps range, this is no longer the case. Faster TCP receiver processing is required. In an attempt to release the resulting bottleneck, attention has turned to the development of a dedicated hardware implementation, or acceleration, of TCP/IP receiver logic. Optimizing a hardware implementation calls for a new approach to the original specification in RFC 793. Among the issues to be addressed are maximization of parallel processing, efficient information passing, and rapid classification and handling of segments.
U.S. Pat. No. 5,056,058, to Hirata et al., whose disclosure is incorporated herein by reference, describes high speed processing of data packets using header prediction. Hirata describes a comparison circuit which forwards packets selectively to either a high speed processing section or a low speed processing section. The prediction is made according to a previously transmitted packet, and the circuit prepares information necessary for a process of receiving a subsequent packet.
U.S. Pat. No. 5,678,060 to Yokoyama et al., whose disclosure is incorporated herein by reference, describes equipment for connecting a computer system to a network. The equipment includes a header retrieval unit which retrieves a header corresponding to the protocol header of a received frame, and uses the retrieved header for predicting a protocol header of a frame to be next received, in correspondence to each of a plurality of connections to the network.
U.S. Pat. No. 5,991,299 to Radogna et al., whose disclosure is incorporated herein by reference, discloses a method for translating frame headers at speeds approximating the reception rate of frames on communication links. The translation uses a dedicated microsequencer which identifies a receive frame encapsulation type and a transmit frame encapsulation type and based on such identification, selects a processing routine which is then executed to translate the frame header. The microsequencer controls the movement of information from an input memory, through a dedicated header processor, to an output memory. The headers of the respective frames are translated within the dedicated header processor to facilitate header translation at high speeds.
U.S. Pat. No. 6,122,670 to Bennett et al., whose disclosure is incorporated herein by reference, is directed to a hardware implementation of TCP/IP packet handing functions. The system includes a computer at a node having a backplane, a CPU board plugged into the backplane, software instructions for the CPU, and a special network board plugged into the backplane. In addition to handling the packets, the system temporally interleaves the processing of different levels of the TCP/IP protocol stack to process a datagram.
U.S. Pat. No. 6,144,996 to Starnes et al., whose disclosure is incorporated herein by reference, describes a system that offers accelerated delivery of content to requestors while guaranteeing, during worst case conditions, a minimum level of service (for content deliver) to the requestors. Utilization of processing resources is monitored and managed so that performance of the system guarantees the minimum level of service.
U.S. Pat. No. 6,173,333 to Jolitz et al., whose disclosure is incorporated herein by reference, describes a network accelerator for TCP/IP which includes mask programmable logic for performing network protocol processing at network signaling rates. Mask programmable logic is stated to be faster and less expensive to construct than available RISC (Reduced instruction set computer) CPU assisted TCP/IP processing boards. The programmable logic is configured in a parallel pipelined architecture controlled by state machines and implements processing for predictable patterns of the majority of transmissions. Incoming packets are compared with patterns corresponding to classes of transmissions which are stored in a content addressable memory, and are simultaneously stored in a dual port, dual bank application memory. Processing of packet headers is performed in parallel and during memory transfers without the necessity of conventional store and forward techniques resulting in a substantial reduction in latency. Packets which constitute exceptions or which have checksum or other errors are processed in software. U.S. Pat. No. 6,179,489 to So, et al., whose disclosure is incorporated herein by reference describes a process for operating a computer system having an operating system, an application program, and a third program. The process uses a first processor having a first instruction set, and a second processor having a second, different, instruction set. The third program establishes message handling functions and bus mastering data transfer operations for the second processor between a host running the operating system and the second processor running the third program.
U.S. Pat. No. 6,208,651 to Van Renesse, et al., whose disclosure is incorporated herein by reference, describes a system which reduces the communication latency of complex layered communication protocols. The system reduces both the message header overhead imposed by layered protocols, and the message processing overhead, by classifying, collecting and aligning the headers. The system also applies pre- and post-processing of a message, packet filtering, and packing and unpacking of messages in cases where a backlog of messages has to be processed.
U.S. Pat. Nos. 6,226,680, 6,247,060 and 6,334,153 to Boucher, et al., whose disclosures are incorporated herein by reference, describe a card associated with a host computer for protocol processing. The card provides a fast-path that avoids protocol processing for some large multipacket messages, and also assists the host for those message packets that are chosen for processing by host software layers. A communication control block for a message is defined that allows data to move, free of headers, directly to or from a destination or source in the host. The card contains specialized hardware circuits that are faster at their specific tasks than a general purpose CPU. The disclosures also describe a trio of pipelined processors with respective processors for transmission, reception, and management processing.
U.S. Pat. No. 6,247,068 to Kyle, whose disclosure is incorporated herein by reference, describes a hardware accelerator for performing protocol acceleration. The accelerator uses hardware decoders each configured to perform decoding for a particular protocol interface. A protocol processor is connected to a data link library and accesses appropriate programs in a data link library to achieve the protocol acceleration.