Two of the most important communication protocols used on the Internet and other similar networks are the Transmission Control Protocol (TCP) and the Internet Protocol (IP). Together, the TCP and IP protocols form core protocols of the larger Internet protocol suite used on packet-switched networks. That protocol suite is commonly referred to as the TCP/IP protocol because of the widespread adoption and implementation of the TCP and IP protocols.
The TCP/IP protocol was developed for the United States Advanced Research Projects Agency (ARPA). The TCP/IP protocol is a set of rules that enable different types of network-enabled or networked devices to communicate with each other. Those network devices communicate by using the TCP/IP standard, or format, to transfer or share data. TCP/IP rules are established and maintained by the Internet Engineering Task Force (IETF). The IETF is an international community of network designers, operators, vendors, and researchers concerned with the Internet's architecture and operation. The IETF's mission is to produce technical and engineering documents that influence the way people design, use and manage the Internet with the goal of improving its operations and efficiencies. These documents include protocol standards, best current practices and information updates of various kinds, and are commonly referred to as Request for Comments (RFC).
TCP can be used to establish a bi-directional connection between two clients wherein activity begins with a request for information made by one client to another client. A “client” is any program or application that initiates requests for or sends information from one remote location to another. As used herein, the term “client” may refer to such applications including, but not limited to, web browsers, web servers, file transfer protocol (FTP) programs, electronic mail programs, line printer (LPR) programs also known as print emulators, mobile phone apps, and telnet programs also known as terminal emulators, all of which operate conceptually in an application layer.
The TCP protocol is typically implemented as a “daemon” that is part of a TCP/IP stack of protocol layers. A daemon—also often referred to interchangeably as a server or service—is generally a software component of a device that runs a background process. As used herein in relation to the operation of the TCP protocol, the term “daemon” is used to refer to a component of a networked device that sends (source daemon) or receives (destination daemon), and processes communications between remote clients according to the TCP standard.
A host is a device or system that runs or executes TCP/IP daemons. As used herein, the term “host” refers to any such device or system including, but not limited to, a server platform, a personal computer (PC), and any other type of computer or peripheral device that implements and runs TCP software. Generally, a host physically connects and links clients and daemons to TCP/IP networks, thereby enabling communication between clients.
TCP software accepts requests and data streams directly from clients and other daemons, sequentially numbering the bytes, or octets, in the stream during the time the connection is active. When required, it breaks the data stream into smaller pieces called segments (sometimes referred to as datagrams or packets generally) for transmission to a requesting client. The protocol calls for the use of checksums, sequence numbers, timestamps, time-out counters and retransmission algorithms to ensure reliable data transmission. [RFC 793, 1981]
The IP layer actually performs the communication function between two networked hosts. The IP software receives data segments from the TCP layer, ensures that the segment is sized properly to meet the requirements of the transmission path and physical adapters (such as Ethernets and CTCs). IP changes the segment size if necessary by breaking it down into smaller IP datagrams, and transmits the data to the physical network interface or layer of the host. [RFC 791, 1981]
IP (and other similar Internet layer protocols) software is not designed for reliability. TCP expects IP to transmit the data immediately, so IP sends the data with no further checks. If actual transmission is delayed or incomplete, the data is discarded. Successfully transmitted data is handed off to the receiving host TCP software, however, which uses its verification and acknowledgement systems to ensure that the data requested is received by the requesting client. If the sending host TCP software does not receive acknowledgement of a complete transmission, it retransmits the data. One consequence of this system is that retransmissions increase when a physical communication path becomes saturated or otherwise unavailable, which in turn causes CPU and network capacity consumption to increase.
The large system effect occurs in processing systems that are designed to handle a specific set of conditions of finite size and complexity. When presented with conditions larger and more complex than expected, those systems no longer operate efficiently—or at all. To illustrate this effect, imagine a small town with one main cross street at an intersection having a stop light that is timed to change on one minute intervals to allow traffic to flow efficiently based on the size of the expected traffic volume. Under normal operating conditions, the design works effectively, as the number of cars entering and leaving the city from any given direction is a volume that fits within the design parameters. However, if the volume of traffic using the cross streets increases beyond the amount that can be handled during a one minute traffic stop, congestion will occur. The congestion will continue to exacerbate if the excess traffic volume does not decrease below the maximum number of cars that can pass through the intersection during the one minute window. Therefore, if new cars entering the town continue to exceed the expected, designed capacity, the traffic system will ultimately fail. The failure of a system in this manner is due to large system effects.
This type of systematic problem can be referred to as a non-linear system moving from ordered operation into chaos. In the previous example, the system moved from an ordered operation into chaos, because the growth of traffic is non-linear and the progression of the system operation is repetitive and does not correct for the change in non-linear conditions. While one would hope that a system could be designed to handle a multitude of changing and expanding criteria, the reality is far less certain because systems can only truly be designed to handle what can be reasonably envisioned.
The chaotic operations produced by the large system effect do not often occur in a smooth or increasing movement of order to chaos. Chaotic order tends to occur as catastrophic breakpoints in system behavior. Even slow changes in a system's control parameters can result in a sudden shift to catastrophe. This type of phenomenon occurs in the water-ice transition at sea level pressure: as temperature decreases below the freezing temperature, water displays a transition to the solid state. Systems that may potentially experience such large system effects may exhibit sudden catastrophic behavior at intervals and without an observable smooth transition.
Large system effects may arise in computer networking systems, protocols and implementations when algorithms that are efficient for small configurations or low transaction rates but that are inefficient for large configurations or high transactions rates are employed. In the context of TCP/IP and network communications, TCP standards control the transmission rate of data streams between connected clients. As networked host processing capabilities and storage become ever more plentiful, the amount of data that clients request and transmit likewise increases. Many of today's clients require increasingly large data transfer rates that amplify tendencies toward network congestion—especially when considering the rapidly growing number of networked “smart” devices and the pervasiveness of the PC.
Current TCP implementations employ flow control mechanisms to ensure that the sending daemon does not transmit data faster than the receiving daemon can process the incoming stream. The standard defines an advertized window size included in each acknowledgement that indicates to the sending daemon the amount of data the receiving daemon is willing to accept. A TCP “advertized window” is a term used to describe in part a logical window used by a receiving daemon to limit the number of outstanding TCP segments in transmission at any given time, and represents the number of bytes that the remote sending client is authorized to send over an IP connection using the TCP protocol. The advertized window allows a receiving daemon to specify its buffer size every time it sends a segment/acknowledgement to the sending daemon. The advertized window and highest acknowledged sequence number together yield the window end point—i.e., the sequence number of the byte following the last position in the receiving daemon's window.
One of the rules is that this end point should never move backward (a shrinking window). Under normal circumstances, as data is received, it is acknowledged and the advertized window is further extended. If the data arrives faster than it can be accommodated, it still must be acknowledged in a timely fashion, but the end point of the window is not advanced. Eventually, all of the data within the advertized window is transmitted, the end point is reached and the window is closed. Once the window is closed, no more data will be accepted until it is reopened. One of the rules is that when a window is reopened, it must be fully reopened to its maximum size.
TCP sending daemons also utilize a logical window referred to as a “retransmission window” that covers the bytes in the data stream that have been authorized for transmission (including sent and unsent bytes). TCP retransmission window sizes operating under normal circumstances are set to and defined by the advertised window size. To increase overall transmission speed, TCP buffers well beyond the window size and maintains the advertized window at its maximum value with every acknowledgement. While this encourages an increase in data transmission, it also exposes the TCP protocol to the large system effect.
While the flow rate of the data streams being transmitted has increased, the size requirements of the actual packets of information transmitted on the IP network, for instance the common physical Ethernet hardware layer, has not. The TCP Maximum Segment Size (MSS) option preferably is used to set the segment size to be no greater than the smallest Maximum Transmission Unit (MTU) of the network. Therefore, as larger and larger window sizes permit the transmission of larger sequence ranges of the data stream, the specific window of data transmitted must be broken into a greater number of segments no larger than the established MSS. TCP is a positive cumulative acknowledgement protocol, and therefore, the greater number of segments being transmitted in a large window generates even more network traffic by increasing the number of potentially outstanding acknowledgements if acknowledgements are sent for each segment received.
Furthermore, adjustments to a TCP implementation made to avoid over-transmission of acknowledgements mean that if congestion within the network causes the loss of a single segment of data, the entire window must often be retransmitted to repair the damage to the overall stream. [RFC 813] This retransmission causes a non-linear expansion of the traffic within the network and therefore results in additional packet loss and subsequently additional retransmissions. This catastrophic behavior is caused because TCP requires more data to be retransmitted than is ultimately required, causing congestion collapse. This large system effect is not corrected by the additional enhancements to IP that have been presented with IP version 6 (IPv6).
Attempts have been made to correct such potential congestion failures in a TCP/IP network. While the TCP specification itself does not provide for specific congestion control mechanisms, implementations may use TCP functionality to provide such mechanisms. For example, many TCP implementations include the use of adaptive retransmission algorithms to vary the time period after which the sending daemons will retransmit the unacknowledged segments in its retransmission window.
In addition to delaying retransmissions when network latency begins to increase due to congestion, later TCP standards included several methods of congestion avoidance, including slow-start additive recovery and multiplicative decrease congestion avoidance algorithms. [RFC 2201] These algorithms are used by a sending daemon to track a congestion window size that, if smaller than the receiving daemon's advertised window size, is used to limit the sequence range being sent. Implementing these and other similar algorithms, however, can significantly reduce data transfer rates by unnecessarily restricting the retransmission window as they are conservative congestion estimating algorithms.
Other optional TCP functions have been introduced to decrease the probability of duplicate data retransmission such as the selective acknowledgement option. The selective acknowledgement option allows a receiving daemon to specify several blocks of discontinuous data that have been received with sequence numbers higher than the sequence numbers in one or more segments that have not been received. The sending daemon may then construct a retransmission that does not include the data in the blocks received out of order. [RFC 2018] While useful, the selective acknowledgement option is limited by the fact that a selective acknowledgement must acknowledge discontinuous data blocks by bounding 32-bit sequence numbers, in practice limiting the option to three or four discontinuous blocks of data. Therefore, beyond the first three or four lost segments in a window transmission, the retransmitting data will be duplicative.
With regard to IPv6 in particular, the new IP protocols remove all IP-level fragmenting from intermediate nodes on a path. Fragmentation at this level may only be performed at the source. After the IP layer receives packets of data from the upper-layer (e.g., TCP), it ensures that the packet size meets the requirements of the transmission path and physical adapters such as Ethernet or other hardware interfaces. If necessary, the source IP daemon reduces the packet size by fragmenting the datagram prior to transmission over the link layer of the network path between the remote hosts. Importantly, the IPv6 fragmentation protocol requires that the receiving host hold datagram fragments for a period of time, and if all fragments are not received within the reassembly time, they are discarded and an error message is dispatched requesting that the entire datagram be retransmitted.
Since IPv6 has been designed to replace existing IPv4 networks, there is a historical artifact, concerning the previous network definition, that has a significant effect upon the efficiency of the new IPv6 implementation. Within any network, the sizes of data transmissions are ultimately limited to the smallest transmission size that can be accepted by all electrical pathways between the communicating nodes. For example, if the network path from a server to a computer includes an Ethernet segment, then all transmissions must fit within the 1500-byte limitation imposed by Ethernet, regardless of how robust the size capabilities are of the vast majority of the network pathway. However, the length of the data payload of an IP datagram is not merely the amount of upper-layer data payload being sent, but it also includes the length of the IPv4 or IPv6 datagram header information. Since IPv6 datagram headers are significantly longer than those of the older IPv4 protocol, networks that are configured for IPv4 can incur a significant amount of datagram fragmentation. In other words, a network that is defined to carry IPv4 data traffic results in the larger IPv6 datagrams routinely being fragmented despite a preference for fragmentation avoidance, since the length of the IPv6 header is longer than that of IPv4. This problem is transparent to the end user and is only recognized by network congestion and the accompanying large computational demands resulting therefrom.
IPv6 is capable of fragmenting a complete datagram into a number of pieces, limited only by the practical structures of the software involved. Therefore, an IPv6 datagram that is fragmented might well be broken into a significant number of pieces. These fragmented parts of the original IPv6 datagram must be reassembled only by the final destination. In the IPv4 implementation of fragmentation, a datagram was fragmented only when presented with a portion of the network that was not capable of handling the transmission, rather than requiring fragmentation to the smallest unit for the entire path. This change has caused the maximum amount of fragmentation to be applied to the entire data being transmitted, thereby extending the issue of packet loss vulnerability to the largest amount of exposure possible, rather than limiting the exposure only to short segments of the entire data path.
For example, if a transmission needs to pass through fifty gateways to reach its final destination, an IPv6 transmission must be fragmented to a size that will fit within all fifty-one segments of the network path. However, in IPv4, this fragmentation issue was only a problem for whichever of the fifty-one segments required the limitation. Since IPv6 forces fragmentation upon the entire network path, any packet loss anywhere within the fifty-one segments has an effect upon the entire transmission.
When packet loss occurs for a fragmented piece of a larger IPv6 datagram, the protocol requires that the entire IPv6 datagram need to be retransmitted, to correct for the loss of a single fragmented part. [RFC 2460] Since this retransmission only occurs once reassembly has failed to complete after 60 seconds in some use cases, significant time is lost with the overall transmission process. This encourages an increase in data retransmission and also opens the IPv6 protocol to the large-system effect. While the amount of data being transmitted has increased, the size requirements of the actual packets of information being transmitted has not. Furthermore, if congestion within the network causes the loss of a single piece of data, the entire datagram must be retransmitted to repair for the damage to the overall stream. This retransmission causes a non-linear expansion of the traffic within the network and therefore results in additional packet loss and subsequently additional retransmissions. This catastrophic behavior is caused because IPv6 requires more data to be retransmitted than is required.
Thus, there exists a need in the prior art for improved fragmentation rules for protocols that restrict intermediate node fragmentation, in order to avoid large system effects.