The Transmission Control Protocol (TCP) is one of the core protocols of the Internet protocol suite. Using TCP, applications on networked hosts can create connections to one another, over which they can exchange data or packets. In the Internet protocol suite, TCP is the intermediate layer between the Internet Protocol (IP) below it, and an application above it. Applications often need reliable pipe-like connections to each other, whereas the Internet Protocol does not provide such streams, but rather only unreliable packets. TCP does the task of the transport layer in the simplified OSI model of computer networks.
One of the strengths of TCP is its ability to guarantee reliable delivery data. TCP accomplishes this reliability by tolerating packet loss through (most generally) timing out and repeatedly retransmitting. Because of the time-out and retransmit mechanism used by TCP, it will take TCP a long time to detect a failed path in the network.
Generally, a failed path will be seen by a TCP sender as a “lost packet”, which is in turn detected because a data carrying packet that the sender has transmitted is not acknowledged by the receiver within a timeout period. However, not all lost packets imply path failure. In fact, it is quite common for IP packets to be dropped somewhere in the network resulting from the “best-efforts” delivery model that is intentionally part of the IP design (it would be very undesirable to require guaranteed-delivery at the IP level for reasons well understood in the art). Transient packet loss is caused either by network congestion (when a router queue overflows) or data corruptions (e.g., because of errors on a wireless link). TCP normally assumes that a packet loss indicates network congestion, and so TCP normally retransmits lost packets multiple times, with the goal of ensuring reliable delivery.
TCP also effectively reduces its transmission rate to compensate for the implied congestion. Otherwise, the too-aggressive use of retransmissions would worsen the network congestion. In times of no packet loss, TCP gradually increases its transmission rate, to try to maximize network utilization.
The choice of timeout for detecting packet loss is a compromise. If the timeout is too long, TCP will not utilize the available network capacity, and the overall throughput of useful data will be less than optimal. However, if the timeout is too short, TCP will retransmit too soon (too aggressively) and will cause unnecessary congestion, which can actually have a worse effect on overall throughput.
TCP, therefore, uses an adaptive algorithm to set its timeout value, There is an initial timeout value that is dynamically set based on measurements of the round-trip time (RTT) and its variance. The retransmission timeout starts out with this value, but with a lower bound (typically one second) so that it never exceeds a limit of one retransmission per second. However, each time a retransmission fails to elicit a response within the current timeout period, the timeout is doubled, up to a configured maximum, and TCP re-attempts retransmission. Because of this “exponential backoff,” which is specified so as to prevent congestion, TCP can take a long time to realize that detect failed network paths (i.e., that there is a persistent failure of connectivity, rather than transient packet loss due to causes such as congestion).
If fast link detection is a priority, it would be a mistake to tune TCP's timeout and retransmission mechanisms solely for fast failure detection, since this can lead to network congestion through excessively aggressive retransmission as described. Instead another mechanism for quickly detecting path failure would be desirable.
Previous solutions for fast path failure detection, other than waiting for TCP to detect the failure, fall into several categories. First. using a TCP keepalive timer: TCP includes an optional “keepalive” timer, which sends a special TCP probe packet if the connection has been idle for 2 hours. This is a rather long time to detect path failure; also, the keepalive mechanism is not meant for use when there is pending data to send. Other path failure detection schemes use a periodic heartbeat: For example, the SCTP transport protocol periodically exchanges “heartbeat” packets when no other data is being sent. This can detect path failures, but the recommended heartbeat interval is 30 seconds. This could be too long for useful failover, but a shorter heartbeat could create excessive network loads. Also, the heartbeat mechanism is not meant for use when there is pending data to send.
Other schemes modify the TCP retransmission mechanism parameters: Given that the normal TCP mechanisms take too long to detect path failure, it would be possible to detect path failures faster by changing parameters. However, reducing the interval between retransmission timeouts can lead to network congestion. Note that TCP normally “probes” the path using a full-sized data packet. There are various TCP mechanisms that cause TCP to prefer to send the largest possible amount of data with each packet. As one cause of packet loss is congestion rather than path failure sending large packets as network probes may actually exacerbate the problem. In fact, in many contexts, congestion is far more likely than path failure. So it is inadvisable to do excessive full-packet probing when there is a possibility of congestion. since it is likely to make congestion worse, perhaps even converting minor congestion into apparent path failure.
Yet another scheme sends periodic ICMP pings: If sending large packets as probes is a problem, one alternative is to send small non-TCP packets as probes for path connectivity. The obvious implementation is to send an ICMP protocol Echo Request packet at intervals, expecting an Echo Reply packet in response. Because these packets are near-minimal in length, they are less likely to contribute to congestion than TCP retransmissions would. However, an excessive ping rate could still add to congestion. If the pinging process is not intelligent enough to be aware of the status of the TCP connection, there is still an undesirable compromise between pinging too frequently (and thereby wasting bandwidth and CPU time), and pinging too rarely (and failing to detect failure soon enough). Test pings may also need to be sent over paths that are not currently in use for TCP data transfer (and so which need not be monitored for failure), thereby wasting network and CPU resources.
In summary, previous solutions suffer one or more of the following flaws: slow detection of path failures; contributing to network congestion; wasting network resources; wasting CPU (host) resources; injecting multiple probes, using TCP's retransmission mechanism, into the network when there are multiple connections between hosts, even when only one is necessary, thereby further increasing congestion.