This invention relates to the method and apparatus for congestion management in computer networks using explicit rate indication. More particularly, the invention is directed to a method wherein sources monitor their load and provide information periodically to switches. The switches, in turn, compute the actual load level and ask the sources to adjust their rates up or down.
While the invention is particularly directed to the art of data congestion management, and will thus be described with specific reference thereto, it will be appreciated that the invention may have usefulness in other fields and applications.
The next generation of computer and telecommunication networks will likely use the asynchronous transfer mode (ATM). ATM networks are connection-oriented networks in which the information is transmitted using fixed size 53-byte cells. The cells flow along predetermined paths called virtual channels (VCs). End systems set up constant bit rate (CBR) or variable bit rate (VBR) virtual channels (VCs) before transmitting information. For data traffic, which is highly "bursty" and does not have strict delay requirements, it is best to dynamically divide all available bandwidth fairly among VCs that need it at any moment of time. Such traffic is called available bit rate (ABR) traffic.
The main problem in supporting ABR traffic is that it is possible that more traffic may enter a switch then can exit and the switches can get congested. To control congestion, the switches typically notify the sources to reduce the traffic rate using a feedback mechanism. Known feedback techniques use a single bit having two values, 0 or 1, representing increase or decrease, respectively. The feedback step may require several executions before the sources adjust to the desired rate. An alternative technique for connection-oriented networks comprises the sending of a control cell from the switches to the source containing the desired rate.
Any time the total demand for a resource is more than the available resource, the problem of congestion arises. The bandwidth, buffers, and computational capacity are examples of resources in a network. The design goal of most network resource management algorithms is to provide maximum link bandwidth utilization while minimizing the buffers (queue length) and computation overhead.
In known congestion management schemes three performance measures most commonly used are efficiency, delay, and fairness. The desired optimal operation of congestion management methods are explained below, as well as the known methods themselves.
For clarification, each virtual circuit has one source and one destination and passes through a number of switches. The terms "source" and "virtual circuit" (VC) are used interchangeably herein. The term "host" is used to denote an end system, which may have several VCs.
One of the first requirements for good performance is efficiency, or high throughput. In a shared environment, the throughput for a source depends upon the demands by other sources. The most commonly used criterion for what is the correct share of bandwidth for a source in a network environment is the so called "max-min allocation." It provides the maximum allocation possible to the source receiving the least among all contending sources. Mathematically, it is defined as follows.
Given a configuration with n contending sources, suppose the ith source gets a bandwidth x.sub.i. The allocation vector {x.sub.1, x.sub.2, . . . , x.sub.n } is feasible if all link load levels are less than or equal to 100%. The total number of feasible vectors is infinite. Given any allocation vector, the source that is getting the least allocation is in some sense, the "unhappiest source." Given the set of all feasible vectors, find the vector that gives the maximum allocation to this unhappiest source. Actually, the number of such vectors is also infinite although we have narrowed down the search region considerably. Now we take this "unhappiest source" out and reduce the problem to that of remaining n-1 sources operating on a network with reduced link capacities. Again, we find the unhappiest source among these n-1 sources, give that source the maximum allocation and reduce the problem by one source. We keep repeating this process until all sources have been given the maximum possible bandwidth.
The following example illustrates that above concept of max-min fairness. FIG. 1 shows a network with three switches connected via two 150 Mbps links. Four VCs are setup such that the first link L1 is shared by sources S1, S2 and S3. The second link is shared by S3 and S4. Let us divide the link bandwidths fairly among contending sources. On link L1, we can give 50 Mbps to each of the three contending sources S1, S2 and S3. On link L2, we would give 75 Mbps to each of the sources S3 and S4. However, source S3 cannot use its 75 Mbps share at link L2 since it is allowed to use only 50 Mbps at link L1. Therefore, we give 50 Mbps to source S3 and construct a new configuration shown in FIG. 2, where Source S3 has been removed and the link capacities have been reduced accordingly. Now we give 1/2 of the link L1's remaining capacity to each of the two contending sources: S1 and S2; each gets 50 Mbps. Source S4 gets the entire remaining bandwidth (100 Mbps) of link L2. Thus, the fair allocation vector for this configuration is (50, 50, 50, 100). This is the max-min allocation.
Notice that max-min allocation is both fair and efficient. It is fair in the sense that all sources get an equal share on every link provided that they can use it. It is efficient in the sense that each link is utilized to the maximum load possible.
The max-min allocation is the desired goal. Any scheme that results in max-min allocation is called max-min fair. If a scheme gives an allocation that is different from the max-min allocation, its unfairness is quantified as follows.
Suppose a scheme allocates {x.sub.1, x.sub.2, . . . ,x.sub.n } instead of the max-min allocation {x.sub.1, x.sub.2, . . . ,x.sub.n }. Then, we calculate the normalized allocations x.sub.i =x.sub.i /x.sub.i for each source and compute the fairness index as follows: ##EQU1## Since allocations x.sub.i 's usually vary with time, the fairness can be plotted as a function of time. Alternatively, throughputs over a given interval can be used to compute overall fairness.
The efficiency of a scheme relates to its making full use of its resources. A scheme that results in underload or overload is considered inefficient. Given a network, it is the bottleneck link (the link with maximum utilization) whose proper loading is important. Thus, an efficient scheme tries to control sources such that the bottleneck link is neither underloaded nor overloaded.
Given two schemes with the same fairness and efficiency, one with lower end-to-end delay is preferred. Generally, there is a tradeoff between efficiency and delay in the sense that if one tries to use a link to 100% capacity, the queue lengths may become too large and the delays may become excessive. While data traffic is generally delay insensitive, extremely large delays are harmful since they may result in timeouts at higher layers and result in unnecessary retransmissions. Therefore, it is often preferable to keep link utilizations below 90-95%.
Most practical schemes take some time to reach fair and efficient operating point. Given two schemes with the same fairness and efficiency at the end of simulation, one which achieves efficiency and fairness faster is preferred. This preference is used to compare different design alternatives. Given the same starting point, the time taken to reach steady state is compared and the alternative producing faster convergence is selected. The steady state is defined informally as a small region around the final operating point. With deterministic simulations, it is relatively easy to identify the steady state since the system starts to oscillate around the final point.
The problem of congestion control and/or management has been known to be the critical part of network architecture design for several decades and hundreds of papers have been written on various schemes. Rather than provide a background survey of all schemes, selected schemes that are (or were) leading candidates for adoption in ATM networks will be considered. However, the present invention is equally adaptable to non-ATM networks.
At the ATM Forum, which is an organization of over 400 computer and telecommunication equipment manufacturers, the traffic management subgroup is responsible for uncovering the most desirable congestion control scheme. In particular, the congestion control for the so called "available bit rate (ABR)" traffic has been given special consideration since approximately May, 1993. By September of 1993, two distinct approaches emerged: The credit based scheme and the rate based scheme.
The credit-based approach consists of using window (or credit) based flow control on every link. Each node (switch or the source) keeps a separate queue for each VC. At each hop, the receiving node tells the transmitting node how many cells it can send for each VC. The number of cells that can be transmitted is called "credits". The number of cells received are carefully monitored so that lost cells can be detected. This approach has a potential to provide full link utilization and guarantee zero loss due to congestion. However, this scheme requires per-VC queuing, per-VC service, and per-VC monitoring. The number of VCs that exist at any time is large and, therefore, per-VC operations are considered undesirable by most switch manufacturers. It is preferable to keep all per-VC operations (except switching) at the end systems. The complexity and cost of implementation has been the main objection to this approach. Vendors are typically not willing to pay the high cost of per-VC operations for the noble goal of "zero loss." The small probability of loss is preferred, particularly if it results in considerable savings in cost.
The rate-based approach is based on end-to-end rate control using feedback from the network. Initially, a backward explicit congestion notification (BECN) method was proposed. However, a forward explicit congestion notification (FECN) was subsequently considered instead of the BECN method. In either case, the cells contain a single bit which is marked by the switches if congested. In FECN, the destination end station monitors these bits and sends a control cell back to the source asking it to adjust the rate up or down. In the BECN version, the congested switches directly send the control cell to the source (and the bit is actually not required).
A sequence of FECN schemes have been proposed at the ATM Forum. The latest one is called the Proportional Rate Control Algorithm (PRCA). In this proposal, the sources would set the FECN bit to one except in every nth cell (where n is a parameter). The switches set the bit to one when they are congested (and do nothing if not congested). If the destination receives a cell with FECN bit set to zero, it concludes that the network is not congested and sends a control cell to the source asking it to increase its rate. The sources continually decrease their rates (after sending each cell) unless they receive the control cell from the destination. A multiplicative decrease and additive increase is used to achieve fairness.
The single bit feedback, while satisfactory for window-based schemes is too slow for rate-based schemes. In window-based scheme, if the control is slow to change (and therefore remains constant for a while), the queue length cannot exceed the specified window size. This is not true for rate-based schemes. If the rate is over the optimal even by a small amount, the queues will keep building, leading to overflow and cell loss. It is important to measure the rate quickly and inform the sources of the rate as soon as possible. These considerations led to the following two explicit rate indication proposals at the ATM Forum meeting of July, 1994, the MIT scheme and the UCI scheme.
The MIT scheme, developed at the Massachusetts Institute of Technology, consists of the sources periodically sending their rates to switches in control cells. The switches reduce the rate value if necessary. The cells are returned to the source by the destination node.
The control cells contain a "Reduced bit" and the source's "Desired rate." Each switch monitors its traffic and calculates its available capacity per VC. This quantity is called the "fair share."
If the "desired rate" is higher than or equal to the "fair share", the desired rate is reduced to the "fair share" and the reduced-bit is set. If the desired rate is less than the fair share, the switch does not change the fields of the control cell.
The destination sends the control cell back to the source. If the source finds the reduced bit set, it adjusts its rate to that returned in the "desired rate" field of the control cell. Next time, the source sends this new rate in the next control cell transmitted. If the reduced bit is clear, the source can increase its rate but it must first determine how much it can increase by sending a control cell with a higher desired rate.
The switches maintain a list of all of its VCs and their last seen desired rates. All VCs whose desired rate is higher than the switch's fair share are considered "overloading VCs." Similarly, VCs with desired rate below the fair share are called "underloading VCs." The underloading VCs are bottlenecked at some other switch and, therefore, cannot use additional capacity at this switch even if available.
The capacity unused by the underloading VCs is divided equally among the overloading VCs. Thus, the fair share of the VCs is calculated as follows: ##EQU2## It is possible that after this calculation some VCs that were previously underloading with respect to the old fair share can become overloading with respect to the new fair share. In this case, these VCs are re-marked as overloading and the fair share is recalculated.
The MIT scheme has been modified slightly by researchers at the University of California, Irvine ("UCI"). The switch algorithm is simplified. The switch does not "remember" any VCs rate. Instead, it computes an exponentially weighted average of the declared desired rates and uses the average as a fair share. The weighting coefficient used for averaging is different during overload and during underload. The MIT scheme requires an order n, O(n), computation in the sense that the number of instructions to compute fair share increase linearly with the number of VCs. The UCI modification makes it of order 1, O(1), in the sense that the computational overhead to process a control cell does not depend upon the number of VCs. However, its ability to achieve efficient and fair operation is questionable.
The use of exponentially weighted average of "desired rates" as the fair share does not seem meaningful. First of all, "desired rates" may not be close to the actual transmission rates. Secondly, any average is meaningful only if the quantities are related and close to each other. The desired rates of various sources can be far apart. Thirdly, the exponentially weighted average may become biased towards higher rates. For example, consider two sources running at 1000 Mbps and 1 Mbps. In any given interval, the first source will send 1000 times more control cells than the second source and so the exponentially weighted average is very likely to be 1000 Mbps regardless of the value of the weight used for computing the average.
The present invention contemplates a new and improved congestion management method which resolves the above-referenced difficulties and others.