Technical Field
This patent document relates generally to distributed computer systems and to computer-to-computer communication via computer networks.
Brief Description of the Related Art
Internet-scale distributed systems and applications often use one or more transport layer relays for long network latency connections. The relay manages two connections to connect (splice) the two different connection segments, which is also called a “split TCP” arrangement. The goal is to achieve higher end-to-end throughput, lower response time, and higher reliability than one long single-segmented connection can provide.
For example, a content delivery network (CDN) platform may use TCP relays. A CDN may be considered an overlay across the Internet on which communication efficiency can be improved. Improved communications on the overlay can help when a proxy server in the CDN needs to obtain content from an origin server, or otherwise when accelerating non-cacheable content for a content provider customer. Communications between CDN servers and/or across the overlay may be enhanced or improved using transmission control protocol (TCP) splicing to effect improved route selection, protocol optimizations including TCP enhancements, persistent connection reuse and pooling, and other techniques such as those described in U.S. Pat. Nos. 6,108,703, 6,820,133, 7,274,658, 7,607,062, and 7,660,296, among others, the disclosures of which are incorporated herein by reference. The CDN overlay (and ability to relay packets) may also be leveraged for WAN optimization. In such cases, CDN appliances or software executing in customer branch offices can connect through the overlay to third party Internet hosted applications and resources, and/or to applications and resources at a customer central data center, the latter providing an accelerated corporate intranet.
FIG. 1 shows a typical connection relay with two endpoints (Hosts 100 and 104) and an intermediate node 102. The two hosts are transferring data. It is known in the art that, if the hosts 100, 104 are far away from each other such that the network latency between them is rather high, placing an intermediate host 102 between them as a relay between two connection segments, and routing data through this host, often helps achieve higher end-to-end performance. This is referred to as “split connections” or, for the common case of using TCP for transport layer communications, as the aforementioned “split TCP”. The use of the relay tends to enhance performance because of the recovery cost of packet loss in reliable data transfer algorithms like TCP. In short, the recovery cost on either one of segments (Connection 106 and Connection 108) is lower than that on Connection 110. Another factor in the end-to-end performance is the connection establishment time, especially for transmission of data units comparable to the size of TCP congestion window. It is preferable to use persistent connections where possible to optimize the end-to-end performance.
To better understand split TCP, consider first the direct model, in which hosts 100 and 104 connect directly. In this model, the connection is established as follows:                1) Host 100 sends a connection request to Host 104        2) Host 104 accepts the connection request from Host 100        3) Host 100 creates an end point for the connection        4) Host 104 creates an end point for the connection        5) Both Host 100 and 104 agrees on the TCP parameters for subsequent data transfer via the connection        6) The connection between the Hosts 100, 104 is now established. The connection comprises an exclusive end point at Host 100, an exclusive end point at Host 104, and a set of parameters both hosts agreed upon to coordinate subsequent data transfer between them.        
This general TCP connection model for the direct case does not dictate the order of events, however. For implementation purposes, the end point allocation may happen either before or after the mutual agreement on the parameter values. Details can be found in the related IETF standards documents.
FIG. 2 illustrates the split TCP case. In this model, the connections are established as follows:                1) Host 100 establishes a connection 106 with Host 102, following the procedure of the direct model.        2) In the subsequent data transfer mode, Host 100 sends a message to Host 102 that Host 100 needs to communicate with Host 104 via Host 102, or sends data that Host 102 understands must be forwarded to a further destination (e.g., Host 104 or someplace further down the line). Note that it is not necessary that Host 100's intention of communicating with Host 104 has to occur after establishing connection 106; such an intention can be introduced to the connection request from Host 100 to Host 102.        3) Host 102 establishes a connection 108 with Host 104, following the procedure of the direct model.        4) Host 102 begins splicing the two connections 106 and 108. Each connection 106, 108 will have an independent set of values for the TCP parameter set. Both connections are terminated at Host 102.        
In the final form of spliced TCP, there are two split TCP connections as shown in FIG. 2. Importantly, the splicing functionality is dealing with the two TCP end points at Host 102 to connect (splice) the two connections 106, 108.
FIG. 3 illustrates, in an abstract form, a typical implementation of the relay at node 102. As can be seen, each incoming packet from one connection at the NIC (Network Interface Controller) at node 102 proceeds to the relay module at the application layer through all the intermediate layers to be spliced to the other connection. Once the packet reaches the relay host, its header information is changed (connection ID, IP address and port number, and status information from one connection to the other), and the packet is subsequently sent down back to the NIC, where it is forwarded to the next hop.
The selection of Host 102 as the chosen relay from amongst a set of candidate intermediate nodes is typically determined by a global routing function, and the selection of the next hop is too. Examples include CDN mapping and routing mechanisms that return candidate machines for a given DNS lookup. A distributed entity of such global routing functions is shown in FIG. 3 as “Router/Relay”.
This model incurs overhead at each layer that the packet traverses, both up and down the stack. For example, each packet received by NIC must be copied into the kernel and then to user-space. Each packet copied to the kernel from NIC comes with an interrupt, which necessarily incurs a context switch (incurring CPU overhead). A copy from the kernel to user-space comes with a software interrupt, which also incurs a context switch.
While Internet Protocol (IP) layer and lower layer forwarding in hardware may be relatively straightforward, doing so for reliable transport layer communications is not. The reliability guarantees offered by TCP introduce message complexity and requirements for keeping state that are beyond existing forwarding table implementations. And while it is known to implement TCP in a network interface card, such implementations are inflexible and expensive, and do not address the need to manage packet forwarding, e.g., for determining where to send a given flow.
There is a need for improved packet relay efficiency at relay nodes in the split connection and/or split TCP scenario. The teachings herein address these needs and also provide other benefits and improvements that will become apparent in view of this disclosure.