Packet communication involves a technique of disassembling information at the sending end of a switching network for insertion into separate bursts, or packets, of data and reassembling that information from the data packets at the receiving end of the network. Communication according to this technique is especially useful in common carrier or time-shared switching systems, since the communication path or circuit required for the packet transmissions is needed only while each packet is being forwarded through the network, and is, therefore, available to other users in the periods intervening the packet transmissions.
The communication circuits which may be shared in such packet networks include transmission lines, program controlled processors, ports or links, and data or packet buffers. In large multinode networks, each node or packet switch accommodates many such ports or links that terminate paths which extend to user terminal equipment or to other nodes. Each node may include one or more processors for controlling the routing and processing of packets through the node. The node is customarily equipped with a large number of buffers for storing packets in anticipation of such routing or awaiting availability of an output link. Each line between nodes or extending to users typically serves a plurality of concurrent calls or sessions between a plurality of calling and called parties or machine terminals.
One of the problems in large packet communication or packet switching systems arises when many users attempt to utilize the network at the same time. This results in the formation of many paths or circuits for routing the data; and, resultingly, the communication facilities become congested and unavailable to a user or to the user's packet when it is being forwarded through the network. It has been found that congestion tends to spread through the network if uncontrolled. As a result, a number of flow control procedures, such as an end-to-end windowing and link-by-link watermark flow controls, have been developed and commercially exploited.
A principal area of packet congestion is in buffers, or queues, in each node particularly where the buffers become unavailable to store incoming packets. One solution to the buffer congestion problem is to halt all incoming traffic on all incoming lines to the affected node when the packet buffers become filled, or congested; and no buffer is available for storing additional incoming packets.
In data switching applications where data is exchanged only between machines and humans (i.e., when there is no intermachine traffic), it is sometimes practical to design viable packet switching systems which have no provision for flow control; and the inherent limitation on the average rate at which humans are able to interpret data can be exploited to make buffer, or queue, overflow probabilities sufficiently small. On the other hand, the average rate at which machines can generate or absorb data has no such universal limitation: it varies widely from one machine to the next, and may even be difficult to quantify for a particular machine, because it will frequently depend upon, for example, operating system specifics, application program specifics, input-output hardware assists. Packet switching networks that handle nontrivial amounts of intermachine traffic must therefore have some means of controlling the flow of data into the network, both to protect buffers, or queues, within the network, and to protect destination machines.
The simple end-to-end windowing scheme for flow control has advantageous properties when viewed strictly from the network periphery. Each machine can have many sessions simultaneously established between itself and various other machines. For each of these sessions (referred to as a logical channel), a given machine is allowed to have p unacknowledged packets outstanding in the network (p is some fixed integer chosen large enough to allow uninterrupted transmission when the network is lightly loaded). The greater the end-to-end network delay, the larger p must be. For example, a machine can initially transmit p packets into the network as fast as it desires; but it then can transmit no more packets (on that particular logical channel) until it has received an acknowledgement from the destination machine for at least one of those outstanding packets. This scheme has several very desirable properties. There is very little wasted bandwidth caused by the flow-controlling mechanism, because the number of bits in an acknowledgement can be made very small compared to the number of bits in the p packets to which it refers. There is also an automatic throttling that occurs under heavy load that divides network capacity fairly among all traffic sources. Finally, it provides automatic speed conversion between machines of different data rate because, for example, a destination can regulate the rate at which it acknowledges packets so that it will not be overwhelmed by too much data from an over-eager source.
A disadvantage of the pure windowing scheme is that it may frequently require an unacceptably large amount of buffer storage within the packet switch. To insure no loss of data, it is necessary to provide, at each buffer, or queue, in the network c.times.p packets of storage either (1) for every source whose packets might transmit that queue, or (2) for every destination whose packets might be fed by that queue (where c is the maximum number of sessions that a source or destination is allowed to have simultaneously in progress). Since some buffers, or queues, may be positioned in such a way that they are fed by a large number of sources, or that they feed a large number of destinations, the amount of queuing required can be impractically large (especially if the packets contain more than just a few bytes).
Flow control utilizing a link-by-link watermark principle enables each node to keep track of its own buffer, or queue, length and sends a "stop-sending" message upstream whenever the queue length exceeds some preestablished upper threshold. As soon as the queue length drops below a preestablished lower threshold, a "resume-sending" message is sent back upstream. The advantage of this scheme is that it is insensitive to the number and type of sources, and it results in the smallest possible queue requirements (because the delay between the sending of a stop-data message and the actual cessation of transmission is minimal). However, each node must know how many links feed each of its queues, and must be able to generate and send the "stop-sending" and "resume-sending" messages out on the appropriate links. Deadlocking is also a potential problem. Illustratively, suppose that the next packet in a queue of a given node is destined for a downstream node B, and suppose that the node B has sent node A a "stop-sending" message. Node A typically has links to many other nodes besides node B, and there may well be many packets in node A's queue destined for those other nodes. If node A's queue is implemented with a simple hardware FIFO, the blocked packet at the front of the queue will also block all of the subsequent packets in the queue, even though their respective outgoing links are available. In the extreme case where node B dies, node A can be indefinitely tied up; and the blockage can ripple upstream with the result that the failure of a single node can incapacitate a large portion of the network.
Such a problem could be eliminated by segregating packets as they arrive at a node according to the link on which they will be leaving the node. If node B shuts off a link x, node A can now continue to send packets out on the other links. However, eventually, queue z of node A, which feeds link x, will exceed its threshold; and then all of the links coming into node A will have to shut off. The only way to fix this problem is to make all nodes which feed node A aware of the segregated queues at node A, so that they can be told to stop sending packets bound for a particular queue at node A. The nodes immediately upstream of node A must therefore segregate their incoming packets, not just according to their own outgoing links, but also according to node A's outgoing links. In fact, each node must have a separate queue for each outgoing link of every node in the final stage of the network that it is capable of reaching. The nodes, as well as the flow control messages themselves, must therefore be quite complex if local malfunctions are to have only local effects. It is advantageous to note that the above deadlocking tendency can be initiated, not only by a network node failure, but also by a single destination becoming inactive.