Computer networks are often described in terms of an OSI (Open Systems Interconnection) Reference Model which defines seven functional layers. The layers range from a physical layer which carries electrical, optical, or other signals up to an application layer which includes programs that interact directly with human users. The middle layer, known as the transport layer, separates the user-oriented upper layers from the network-oriented lower layers. The transport layer is responsible for dividing data into packets and transporting those packets, with the aid of the network-oriented layers, from a sender to a receiver.
People who design and implement the transport layer of a network generally attempt to balance several competing objectives. Data packets should be transmitted from the sender to the receiver as quickly as possible without congesting the network, without dropping any data packets along the way, and without overflowing the buffer space that holds the packets before the receiver forwards or otherwise processes them. Data packets sometimes travel along different paths through the network, but the data must still be presented to the final receiver in the proper order.
Transport layers face the further challenge that real world networks do not present a consistent environment. The available bandwidth, response time, buffer space, and data loss rates of a typical network change frequently. User-oriented layers above the transport layer may also dictate changes in transport layer behavior by altering packet priorities. A transport layer tries to respond to user requests and to changing network conditions quickly, efficiently, and in a predictable manner.
A variety of transport layers are presently used, including layers which implement TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) on UNIX systems, protocols TP0 through TP4 of the OSI model, SPX (Sequenced Packet Exchange) and IPX (Internet Packet Exchange) in Novell NetWare systems (SPX, IPX, NOVELL, and NETWARE are trademarks of Novell, Inc.), and other protocols.
TCP, TP0 through TP3, and SPX are normally connection-oriented approaches, while IPX, UDP and TP4 support connectionless service. In either form of communication, packets may take different paths between the sender and the receiver. No state is maintained during connectionless communication (other than the transmitted data). Data received is not acknowledged and there is no capability for resending lost data.
By contrast, during connection-oriented communication the endpoints maintain state information to track the results of data transmissions. Unless certain real time protocols are used, receivers also acknowledge the data they receive and senders retransmit data that was not received. The data packets or datagrams used are typically assigned sequence numbers; the receiver places the data back in its original order before presenting it to the application.
TCP and TP0 through TP4 use a window sizing method to control the flow of packets and reduce congestion. A sending endnode on the network has a window size that defines the maximum number of unacknowledged packets allowed. It is assumed that the network will be busy but not congested if the sending endnode is not allowed to send "too many" additional packets before receiving acknowledgments ("acks") that packets sent earlier have indeed reached the receiving endnode. The window size determines how many additional packets is "too many." In some transport layers, the window size is set during initial system configuration and remains fixed thereafter. In other transport layers, an attempt is made to adapt the window size to changing network conditions.
TCP is perhaps the most widely used transport layer protocol, and is the presently favored transport protocol on the Internet. TCP attempts to balance competing transport layer objectives in the following manner. After a connection is established, a maximum sending endnode window size is determined by the receiving endnode and a current sending endnode window size is determined by the sending endnode. The current window size will be less than or equal to the maximum sending window size.
One form of TCP uses a "slow start" method for altering the current window size. Initially, the sending endnode's window size for sending is one packet. After the first data packet is placed on the network by the sending endnode, the window "closes" because the number of unacknowledged packets equals the window size. If all goes well, after one packet round trip time the sending endnode receives an ack from the receiving endnode. At this point the window opens again, and the window size might be increased.
The slow start method increases window size rapidly up to a threshold and then slowly thereafter. The "slow" in "slow start" refers to the fact that the initial small window sizes only allow relatively slow data transmission. Slow increases in window size increment the window size once for each group of several acks received, while rapid increases increment the window size by one for each ack received. In either case, the window size is increased until one of the following occurs: a predefined maximum window size is reached, a packet is lost, or there is no more data ready to send.
If no packets are lost and the predefined maximum window size is not too large and a steady flow of data is ready to send, then TCP will reach a steady state. In the steady state, one new data packet is placed on the network for transmission each time an ack is received. The time lapse between each packet transmission is determined by the rate at which acks arrive at the sending endnode. As long as the network bandwidth and latency remain consistent, this steady pacing of packets helps avoid congestion and minimizes the risk that packets will be lost through buffer overflows.
However, many networks do not have constant bandwidth and latency, and some do not have equal latency in both directions. In some network environments, such as satellite channels, data is bunched together and then transmitted in a burst. Similar bunching may occur on heavily loaded public networks that use frame relay methods. In such cases, acks will be compressed in bunches with large gaps between each bunch. TCP relies on continuous data and ack transfers for best performance. When ack compression causes acks to arrive in bursts, TCP does not efficiently use the available bandwidth between the bursts.
TCP performance depends heavily on choosing an appropriate value for the threshold and the maximum window size. If either the maximum window size or the threshold is too small, then TCP will place less data on the network than it should for optimum performance.
If the predefined maximum window size is too large, then TCP will drop packets, overload the network, or both. The current window size will grow toward the maximum window size until it exceeds the window size that is optimal for the link and TCP will then place "extra" packets on the network. Extra packets are those which overload the network by exceeding the optimal number of packets given the available bandwidth and buffer space. What happens as a result of the extra packets depends on whether buffer space is available at nodes between the sending endnode and the receiving endnode.
If the slowest node in the route between the sending endnode and the receiving endnode does not have enough buffer space to hold the extra packets, it will drop the packets. Dropped packets cause TCP to go through a time-out and retransmit process. Dropped packets also cause a reduction in window size; some versions of TCP reduce the window size by half for each dropped packet, while others simply start over with a current window size of one. In either version, the dropped packet causes TCP to repeat the cycle of growing the window size, placing extra packets on the network, dropping a packet, reducing the window size, and repeating the cycle. This cycle severely reduces packet throughput.
If the slowest intermediate node does have enough buffer space to hold the extra packets, it will not drop the packets. However, the network will be congested because the extra packets will precede packets transmitted by other sending endnodes. Moreover, if enough packets from other sending endnodes arrive at the slow intermediate node, those other packets will be dropped, causing the throughput-reducing cycle described above.
TCP performance also depends heavily on the method used for altering window size. For instance, when there is no more data ready to send, TCP sets the current window size back to one and starts increasing it once more toward the predefined maximum. If the window size were still at or near the maximum when data became available, packets would be placed on the network in rapid succession until the window closed. Such a sudden burst of packets could cause congestion that interferes with other network users, and packets at the end of the burst might be dropped. As noted, recovery from dropped packets seriously degrades network performance. Accordingly, the window size is reset to one each time the available data has all been transmitted.
Unfortunately, resetting the window size to one for each new block of data prevents TCP from fully utilizing the available bandwidth when data is intermittent. Common configurations that generate intermittent data include distributed operating systems, message passing systems, remote procedure calls, distributed database locks, distributed file systems, client/server cache consistency protocols, other request/reply data, and other data that is created in response to distributed events. TCP does not provide optimal throughput for such data because the data tends to arrive at the TCP layer intermittently rather than continuously, forcing the window size reset to one and the resulting waste of bandwidth.
Some transport layers use a "packet metering" protocol instead of a window sizing protocol. Unlike window sizing, which limits the number of unacknowledged packets allowed, packet metering limits the rate at which packets are placed on the network for transmission. Packet metering assumes that the network will be busy but not congested if the sending endnode is not allowed to send packets "too quickly." The packet metering rate determines how quickly is "too quickly." In some transport layers, the packet metering rate is set during initial system configuration and remains fixed thereafter. In other transport layers, an attempt is made to adapt the metering rate to changing network conditions.
Window sizing and packet metering have different advantages relative to one another. If the metering rate is set correctly, packet metering prevents bursts caused by intermittent availability of data. As noted above, TCP window sizing avoids congestion but also wastes bandwidth by starting each new block of data with a small window size. By contrast, optimal packet metering transmits newly arrived data at a rate that does not cause congestion or packet dropping without wasting bandwidth. Properly metered packets are transmitted at the highest rate possible without flooding the network.
However, one advantage of window sizing relative to packet metering is that window sizing tends to provide better limits on the number of extra packets that are transmitted after packets are dropped. Window sizing stops placing packets on the network immediately when an expected ack does not arrive. By contrast, packet metering relies on rate measurements taken with respect to two or more packets and thus responds more slowly to missing acks. In one packet metering system, for example, information regarding both the rate at which packets arrive at the receiving endnode and the rate at which they are removed from the receiving endnode's buffer for processing is transmitted to the sending endnode at intervals. During such an interval, extra packets continue to be placed on the network.
Many existing transport layers are statically configured with parameters such as a maximum window size, a slow start threshold separating rapid window growth from slower window growth, or a default packet metering rate. These transport layers respond inadequately to changing network conditions. As noted above, neither window sizing nor packet metering systems perform well if their defining parameters do not match network conditions.
Unfortunately, it is difficult even for network administrators to determine which window sizing or packet metering parameter values to use in a given network. The network may include a variety of environments, such as Ethernet, satellite, FDDI, modem, token ring, and so on. At a minimum, the administrator must understand how TCP and other protocols used work in each network environment, must know how to configure systems by installing parameter values in them, and must have the tools needed to analyze network behavior. Knowing the maximum buffer space available on each intermediate node is also very helpful. These prerequisites may be difficult to obtain.
But even when such tools and information are available, the administrator faces the harsh reality that most network environments change frequently. Traffic generated by users changes the available network bandwidth and hence the optimum window size or packet metering rate. Nodes come up and nodes go down. Hardware is exchanged for other hardware that has a different buffer size. New links having different packet propagation characteristics are added, as when satellite links are first connected to a network. Such changes often make particular maximum window sizes or metering rates obsolete soon after they are installed.
The TCP slow start protocol attempts to overcome the drawbacks of static configurations by altering the window size in response to changes in the network. As described above, however, this attempt is not entirely successful because slow start depends on statically configured parameters such as the threshold and maximum window sizes and because window sizing does not use bandwidth efficiently when data is supplied intermittently.
Thus, it would be an advancement in the art to provide a novel method for controlling packet transmissions which responds efficiently to changing network conditions.
It would also be an advancement to provide such a method which combines positive aspects of window sizing and packet metering.
Because of the enormous investment in existing networks, it would be a further advancement to provide such a method which can be used by simply replacing the transport layer of an existing network with a transport layer that implements the novel method.
Such a method for controlling communication over a computer network is disclosed and claimed herein.