As noted above, the Transmission Control Protocol/Internet Protocol (TCP/IP) is a frequently used transport/network layer protocol of digital communications networks such as the Internet. The TCP protocol is held to have a relatively reliable data transport protocol. That is, a sending system can detect whether data has been successfully received at its destination and if not, can take steps to ensure that it is. Once a packet arrives at its destination, the receiving system sends an acknowledgement (ACK) message for that packet back to the sender. When the sender receives the ACK message, it knows that the original packet was safely received.
Often, however, a packet will be corrupted in transmission. This may be due to a noisy transmission channel or some other reason. Further, although the packet may properly reach its destination, the ACK message sent in return may not be received by the sender for similar reasons.
Similarly, a packet sent from the sending system or its return ACK message may be lost in transit. This communication problem can be detected by establishing a time period which begins when each packet is sent. If a corresponding ACK message is not received within that time period, the packet is resent.
In any case, the TCP protocol attempts to remedy the communication problem by resending the packet. If a proper ACK message still is not received, the packet is sent repeatedly, at ever-increasing intervals, until a proper ACK is received or an application timeout value is exceeded.
Although this retransmission feature provides a valuable data integrity function, it does so at the expense of bandwidth. That is, each retransmitted packet sent by the TCP layer occupies a segment of bandwidth that could have carried a new packet. When the number of retransmissions is small, the lost bandwidth is negligible and system performance is not significantly affected. As the number of retransmissions rises to become a significant portion of the connection traffic, perhaps with multiply-retransmitted packets, effective connection traffic becomes a small percentage of its maximum value. This condition is known as congestion collapse.
To prevent such occurrences, four related algorithms, slow start, congestion avoidance, fast recovery and fast retransmit have been incorporated into TCP/IP. The first, slow start, is implemented so that a newly established connection does not overwhelm the network by generating more additional traffic than the network can absorb on a specific route. Slow start represents flow control by the source for the purpose of maintaining network stability. A sliding window protocol achieves flow control by the receiver for the purpose of minimizing the loss of data caused by buffer overflow.
More specifically, for each connection TCP remembers the size of the receiver's window rwnd as provided in ACK messages and a limit cwnd called the congestion window. The congestion window cwnd is a sender-side limit on the amount of data the sender can transmit into the network before receiving an ACK message. The sender's window is always the minimum of the receiver's window (the size of the receiver's buffer, i.e., the amount of new traffic it can accommodate) rwnd and the congestion window cwnd. At non-congested steady state, the receiver window and congestion window are the same size. In congested conditions, reducing the congestion window reduces the traffic the TCP layer will inject into the connection.
Whenever a TCP connection loses a packet, receives a corrupt packet or the like, this may represent the onset of a congestion condition. In this case, the sender reduces the congestion window cwnd by half, to a minimum of a single segment. A slow start threshold variable ssthresh will be set with this value; specifically, ssthresh=max{2, min {cwnd/2, rwnd}}. For segments that remain in the allowed window, the retransmission timer will be decreased exponentially upon continued failures. Since the reduction in the congestion window is half for each loss, it shrinks quickly and becomes exponential with continued loss.
When congestion ends, i.e., a certain number of ACK messages are received in a row or some other criteria are satisfied, the TCP protocol begins the slow start procedure. Here, the congestion window will be started at the size of a single segment and will be increased by one segment each time an acknowledgement arrives; that is, two packets are added to the allowable window for every ACK message received. This continues until the window is equal to ssthresh. Afterwards, slow start ends and the second procedure, collision avoidance, begins in which the window is increased by one packet for each packet for which an ACK is received.
While the slow start procedure provides an effective way for avoiding collision collapse conditions, the transmission rate is cut drastically upon loss of a packet. This may be acceptable if the goal is conservative use of a public network; however, it is less than preferable for a private network in which access to bandwidth by applications can be controlled. This is because, e.g., a private network may be able to be more aggressive due to its relatively controlled environment; public platforms must ramp up from a relatively low level due to the unknown nature of sources delivering information to the network.
That is, in a public network the number of users trying to send information at one time cannot be controlled; thus, the chance of users overloading the network during busy periods is significant. In a private network, on the other hand, the number of users can be controlled, further, information about the bandwidth those users will need is available. Thus, it may be possible to predict in advance the level of traffic and size of the network needed, so the danger of congestion is significantly less. In, e.g., signaling networks such as SS7, the “users” are telephone switches and the number of these and bandwidth that they use for signaling is predictable.
Also, when using it to control the flow of data into a newly-opened connection, traffic cannot ramp up to the desired rate as quickly as possible. Further, if, for example, two connections are used for redundancy, when one path fails it is not possible to immediately transfer the full traffic load to the other path—it is necessary to go through the slow start process.
This is particularly evident in a redundant network having a primary and a backup link. If the primary fails, because of slow start all of the traffic cannot immediately be transferred to the backup. Instead, traffic can be increased on the backup only at the rate allowed by slow start, even if the network is pre-configured to allow some reserve bandwidth for the backup link.