1. Field of the Invention
This invention relates to data telecommunications systems and mutiplex communications techniques, and more specifically to the enhancement of hardware and software used with Performance Enhancing Proxies (PEPs), to optimize the performance of the Transmission Control Protocol (TCP) in the bidirectional transmission of data packets over satellite links.
2. Description of Related Art
The Internet is a world-wide computer super-network, which is made up of a large number of component networks and their interconnections. Computer networks may consist of a wide variety of connected paths or network links serving to transport user information in the form of data between a diverse array of computer end systems. Different network links are more or less suitable for different network requirements. For example, a fiber optic cable typically provides a high bandwidth, low per bit cost, low error rate and low delay point-to-point network link. Alternatively, for example, a satellite link typically provides a lower bandwidth, higher per bit cost, higher error rate and longer delay point-to-multi-point network link. The wide variety of links and thus link characteristics encountered on the Internet, or other private (IP) based networks, have a variety of effects on the behavior of protocols in the IP suite.
IP primarily provides the routing functionality for packets (bits or bytes of data) over a network. It acts at the network layer to direct packets from their sources to their destinations. Transmission Control Protocol (TCP) is the reliable transport layer protocol of the IP suite of protocols and as such, layers on top of IP, provides reliability to applications, and builds on IP's unreliable datagram (packet) service. TCP underlies the vast majority, estimated to be around 90%, of all the traffic on the Internet. TCP supports the World Wide Web (WWW), electronic mail (email) and file transfers, along with other common applications. TCP was introduced in 1981 and since then has evolved in many ways, but today still provides reliable and largely efficient service over a wide variety of links as evidenced by its omnipresent nature. However, there are a variety of conditions under which TCP may perform below expectations, geosynchronous satellite links being one prime example. The problems of TCP over satellites has been previously documented. TCP performance is typically degraded to some extent in terms of lowered throughput and link utilization by, but not limited to, the following link characteristics: long delay, high bandwidth, high error rate, link asymmetry and link variability, all of which may be encountered on satellite and similar links.
In response to the established use of TCP and also of certain link types (such as satellite) which are not ideal for TCP, Performance Enhancing Proxies (PEPs) have been introduced TCP performance over geosynchronous satellite (GEO) links is traditionally very poor from a user perspective in terms of transfer time and throughput for web browsing and file transfer, among other applications relying on a TCP transport layer.
PEPs may function as one or more devices or pieces of software placed in the end-to-end path that suffers TCP performance degradation. PEP units may, for example, surround a satellite link. PEPs modify the traffic flow in an attempt to alleviate the issues of TCP traffic on a specific link. PEPs may use many methods, either alone or in concert, to enhance performance.
One type of PEP, known as a distributed, connection-splitting PEP is commonly chosen due to that fact that it allows for the use of a proprietary protocol across the satellite link. This protocol can then be chosen or designed to mitigate problems specific to the link. A distributed PEP uses more than one PEP device in an end-to-end connection, often two PEP devices are used. If two PEP devices are used, the end-to-end connection may be split into 3 connection segments. The end connections must remain TCP for compatibility, but the inter-PEP connection may be any protocol. Several protocols are available for use on the satellite link that provide improved performance over that of TCP. Examples of these protocols are Xpress Transport Protocol (XTP), Satellite Transport Protocol (STP), Space Communications Protocol Standards—Transport Protocol (SCPS-TP), standard or enhanced User Datagram Protocol (UDP) or even non-standard (modified) TCP. In addition to the protocol used, there are also many ways in which a PEP device may handle processing between connection segments in this type of system.
One of the link characteristics that affects TCP performance is delay. Links such as those over GEO satellites have long delays, for example, around 500 ms or more. Several TCP mechanisms that control connection setup, flow control and error correction through retransmission may be adversely affected by long delay links.
For transfers that are typically short in duration, such as web pages, the delays involved in establishing TCP connections make a proportionally larger contribution to the transfer time, and therefore to the mean data throughput rate. Additionally, a user typically begins to view a web page as it is downloading so an initial delay before any material is displayed may frustrate a user and also consequently, potentially cause re-requests which lower system efficiency.
The delay in connection opening is caused by a mechanism known as the TCP Three Way Handshake (3WHS). The purpose of this exchange of messages is to ensure that the other end point is present, and thereby to promote the reliability of a transfer. A connection initiator sends a packet with the SYN (synchronize) flag set. A responding system sends back a packet with the SYN and ACK (acknowledgement) flags set. The ACK flag acknowledges the initiator's SYN. The initiator then sends a final ACK packet acknowledging the responder's SYN. From this point on the initiator may send data. Thus, the delay from initiation to sending data on a TCP connection is a whole Round Trip Time (RTT).
When opening a TCP connection in a distributed split-connection PEP implementation, there are two main options and then variants thereof. For preserving end-to-end behavior of the connection and reliability, the connection should be opened end-to-end and the connection should be opened by the endpoints and not the PEP devices. Although more reliable than alternatives, however, this method involves a full RTT of overhead during which no data is transferred. An alternative method involves accelerating the opening of certain connections, such as web connections, which are of short duration and thus more heavily affected by extra RTTs.
An initiator sends a SYN packet to a PEP and the PEP responds locally with the SYN/ACK packet to the initiator. The initiator then responds with the ACK packet and the first data packet, which in the case of a web transfer is an HTTP request packet. The PEP then combines the original SYN packet (which it has held) and the first data packet and sends them over the satellite link to the other PEP device. The lower RTT on the terrestrial link means that the time taken to send the first request is reduced.
A problem with the above accelerated opening is that it is possible to open a connection locally that might then fail to establish end-to-end, resulting in a desynchronized state. This state will eventually time-out. However, during the time that the two endpoints are desynchronized, the user will be confused, as the connection appears to be established but no data will be transferred, which again could lead to the user re-attempting the connection several times and wasting bandwidth.
As described earlier, the Internet is a collection of networks and interconnections. These interconnections and network links each have their own characteristics. One characteristic is the Maximum Transmission Unit (MTU) size. This value, often expressed in bytes, is the maximum data payload that may be encapsulated and carried over the OS/ISO 7-layer model link layer without being broken down into a smaller unit. Two common technologies for LAN links are Ethernet and the similar, but not identical, IEEE 802.3 standards. Ethernet allows for the encapsulation of a 1500 byte IP packet (1500 byte MTU) while 802.3 encapsulation allows for a 1492 byte MTU. It can be imagined that in a network of heterogeneous links there will, sometimes, not be one common MTU for any path between points A and B in a given network or path through the Internet.
In response to the recognition that any given path through a network may not have a consistent MTU for all hops, the IP protocol allows for fragmentation of IP packets. If the IP layer at a host or router is unable to send a packet of the desired size onto the link, the IP layer will split that packet up into several smaller packets. When this behavior occurs at a router between ports, it is known as fragmentation and is commonly recognized to have detrimental side effects, such as lowering maximum data rate (through additional header bytes and also packet processing overhead at network nodes) and impacting efficiency. However, fragmentation is necessary to allow the data to pass end-to-end.
In an attempt to avoid fragmentation, the process of Path MTU Discovery (PMTUD) was introduced. The purpose of this process is to try to detect the minimum MTU in the path from source to destination. This value is dynamic if the route changes. The IP header has a flag, which may be set to inform intermediate network nodes (i.e. any devices in the network between the source and destination) not to fragment a packet. This flag is known as the Don't Fragment (DF) flag. When the DF flag is set, a router should discard the packet if it is too large to forward on the outgoing interface. The router should also send an Internet Control Message Protocol (ICMP) Can't Fragment (ICMP type 3 [destination unreachable], code 4 [fragmentation needed but don't-fragment bit set]) message back to the originator of the packet. This packet should contain the MTU of the outgoing interface on the router to inform the sender of the limiting MTU. Through this mechanism, a sender may adapt to the path MTU and avoid fragmentation. This mechanism is therefore desirable for efficiency reasons.
Currently, there is little guidance on how PMTUD should function in the presence of PEPs. In the absence of guidance, it is currently left to the decision of each PEP designer or manufacturer on how to handle the PMTUD mechanism at a PEP. One solution requires that ICMP messages pass through its PEP devices without modification. This allows for the sender to adapt its path MTU estimate and send smaller packets in the future.
However, a problem exists in a connection-splitting distributed PEP, due to the fact that the PEP devices are often buffering packets that are in transit between the endpoints. These packets have been acknowledged to the sending endpoint and are, therefore, no longer buffered by the endpoint itself for retransmission. Therefore, if a router drops a packet after the second PEP in the connection and an ICMP Can't Fragment message is sent to the originator, a problem occurs. The originator is able to lower the Path MTU estimate but cannot retransmit the data in the original packet. The second PEP in the connection has a copy of the packet buffered so may retransmit when no TCP acknowledgement arrives, but will not understand that the packet must be resized to a smaller packet to arrive successfully at the destination. Therefore, a deadlock may occur until several retransmissions of the packet have failed and the connection has to be reset.
One solution to this problem is that PMTUD may be disabled when a PEP is included in the end-to-end connection to allow the connections using the PEP to function correctly. This however is not ideal for the reasons stated above. Hence, problems exist in the current technique for PMTUD when PEPs are used.
Each protocol used on the Internet has its own packet format, which specifies the way that information is encoded in headers and where data begins in a packet, among other things. The TCP packet format includes the TCP header and space in the header for optional fields known as TCP options. Distributed connection splitting PEPs may use other (non-TCP) standard protocols and possibly proprietary protocols between the two PEP devices. These non-TCP protocols are used to gain performance advantages over end-to-end TCP and even split connection TCP, performance however is only one, although the most important, aspect of a PEP. A PEP must also be compatible with the end hosts and the TCP protocol. If the PEP to PEP protocol does not support the transfer of certain TCP information from end-to-end then functionality will be lost; the TCP urgent pointer which is used to expedite transfer of portions of the data stream being one example.
When choosing or designing a protocol for the problematic link there is, therefore, a tradeoff between efficiency and compatibility. If using an entirely different protocol, it may be necessary to carry the TCP information in extra header structures, which may increase the packet overhead on each packet. Increasing packet overhead may also trigger IP fragmentation for packets that were originally the maximum size for the link; this should be avoided. Also, the end-to-end path over which the connection travels may have intermediate equipment that does not know how to handle unknown protocols. For example, Network Address Translation (NAT) devices may perform translation of the IP address fields and sometimes layer 4 protocol port numbers also. These types of operations can then require the checksum fields to be updated. If a protocol is not recognized, it may not be able to function properly at, for example, the NAT device or packets may pass the NAT device but be unrecognizable at the receiver. Additionally, the functionality of a newly designed protocol will impose constraints on the information that must be carried in each packet. For the proprietary protocol chosen for use with the PEP design of this invention, no pre-existing packet structure was considered appropriate.
For problematic links, TCP has been improved by several different mechanisms to address different issues. For the case of packet and acknowledgement loss, TCP has been improved by the addition of the Selective Acknowledgement (SACK) option. This allows TCP packet headers to carry information on contiguous blocks of packets that have been successfully received. This mechanism adds overhead to each packet and although the overhead is only a small percentage on large packets (around 1% on a 1500 byte packet), the percentage overhead on a standard acknowledgement packet is much larger. For a 40-byte packet, an extra 12 to 20 bytes of SACK information is between an extra 30 and 50% of the original packet size. More seriously, if the TCP acknowledgements are carried over a link layer protocol such as Asynchronous Transfer Mode (ATM), a TCP acknowledgement with SACK information may no longer fit within a single ATM cell. If, instead, two cells are required for the acknowledgement then acknowledgement traffic volume is, in effect, doubled. If, for example, this is the return channel on a satellite system such as the Digital Video Broadcast-Return Channel Satellite (DVB-RCS) where most traffic may be acknowledgement traffic, then the total traffic volume may also be nearly doubled.
TCP also uses a cumulative acknowledgement scheme to signal correct reception of packets to the sender. Optionally, TCP may use the SACK option described earlier if higher packet loss rates are expected, as may often be the case over satellite links, for example. Whether standard TCP acknowledgements are used or whether the SACK option is used, the same method of acknowledgement must be used throughout the duration of the connection. If the error conditions on the link change during the course of the transfer, the connection performance may be adversely impacted if an inappropriate acknowledgement method is chosen. For example, if the standard TCP acknowledgement scheme is selected, the TCP transfer may suffer very poor performance or even failure under heavy error conditions. If the SACK scheme is chosen, the additional overhead, as described above, may be incurred even if the SACK scheme is not needed. TCP is unable to adapt the acknowledgement scheme to changing error conditions during the course of the connection. This problem exists with the conventional systems in the area of acknowledgement of packets.
TCP also uses a timer as one method of detecting lost packets and triggering retransmissions. However, in the conventional systems, only one timer is used regardless of how many packets are being sent. TCP uses the timer in the following manner. When there are no packets in transit, the timer is off. When the first packet is transmitted, the timer is set. When a packet is acknowledged and other packets are still in transit, the timer is reset. Therefore it may take different amounts of time to detect a packet loss depending upon which packet in a group is lost. In the worst case it may take up to the timer timeout value plus the round trip time to detect a loss. This time period may be almost twice as long as the detection period for loss of the first packet. In the ideal case, every loss should be detected as quickly as possible.
Additionally, and perhaps more importantly, if an acknowledgement scheme is used in which repeated retransmission triggers occur for the same packet, the single packet timer provides no indication of how long an individual packet has been in transit. This means that it is not possible to know if a transmitted packet has had time to be acknowledged or not. In this case, it is possible to retransmit a packet before it has had time to reach the destination and an acknowledgement be returned and received. This scenario lowers the efficiency of the link as packets are transmitted multiple times unnecessarily. This is a problem in the conventional systems related to controlling or limiting unnecessary retransmissions.