ATM is a high speed connection oriented packet switching technique where information is organized into fixed length packets, called cells. In general terms, an ATM network is a collection of end systems interconnected through one or more ATM switches. On a connection, generally, an end-system both sends data to the other end systems on the network involved in the connection, and receives data from other end systems on the network involved in the connection. When an end system is a sender of data, it is referred to as a source; it is referred to as a destination when it is a receiver of data. Typically an end system acts as both source and destination.
The following documents are referred to as prior art;
[1] "ATM Forum Traffic Management Specification, Version 4.0" (Draft), S. S. Sathaye, ATM Forum/95-0013R9, December 1995. PA1 [2] "Enhanced PRCA (Proportional Rate-Control Algorithm)", L. Roberts, ATM Forum/94-0735R1, August 1994. PA1 [3] "NIST ER Switch Mechanism (An Example)," N. Golmie et al., ATM Forum/95-0695, June 1995. PA1 [4] "Intelligent Congestion Control for ABR Service in ATM Networks," K.-Y. Siu and H.-Y. Tzeng, Computer Communication Review, Vol. 24, No. 4, pp. 81-106, October 1995. PA1 [5] "Example Switch Algorithm for Section 5.4 of TM spec," A. Barnhart, ATM Forum/95-0195, February 1995. PA1 [6] "A Sample Switch Algorithm," R. Jain et al, ATM Forum/95-0178R1, February 1995. PA1 [7] "ERICA+: Extensions to the ERICA Switch Algorithm," R. Jain et al, ATM Forum/95-1346, October 1995. PA1 [8] "The Rate-Based Flow Control Framework for the Available Bit Rate ATM Service," F. Bonomi and K. Fendick, IEEE Network, Vol. 9, No. 2, pp. 25-39, March/April 1995. PA1 [9] Applicant's pending application Serial No. 08/634,488 filed on Apr. 18, 1996, "Flow Control of ABR Traffic in ATM Networks", now U.S. Pat. No. 5,754,530 issued on May 19, 1998. PA1 First, CCR represents the maximum rate that the VC can use (i.e. ACR) at the time the RM cell was transmitted by the source. However, it is expected that an ABR VC may not use all of its ACR, and thus ACR will be larger than the actual rate of the VC. Therefore, CCR does not necessarily reflect the actual rate of the VC, even just before the RM cell is transmitted (by the source). PA1 Second, when the RM cell is received by an intermediate switch and its CCR field is read by the switch, the CCR field will be too old to even reflect the current value of ACR back at the source, not to mention its current actual rate. PA1 Third, a bad (but perhaps smart) source may insert the wrong ACR value in its CCR field, hoping to acquire more rate by doing so. If this happens with any of the known ABR switch mechanisms, the network may suffer (in terms of buffer overflows, etc.) and major fairness problems may arise.
According to [1] above, five service categories, differing in traffic characteristics and/or service guarantees, are defined for ATM networks. They are: (1) CBR (constant bit rate); (2) rt-VBR (real-time variable bit rate); (3) nrt-VBR (non-real-time variable bit rate); (4) UBR (unspecified bit rate); and (5) ABR (available bit rate).
ABR is the most recent among the above service categories. ABR is intended mainly for non-real-time data applications with varying and/or unknown bandwidth requirements and which cannot be easily characterized in terms of a peak cell rate, a sustainable cell rate, and a maximum burst size. Furthermore, it is the only one among the above service categories that is inherently closed loop. Example types of applications for ABR are any UBR application for which the user wants a more reliable service, critical data transfer (e.g. defence information), super computer applications and data communications applications requiring better delay behavior, such as remote procedure call, distributed file service (e.g. NFS), or computer process swap/paging [1].
The source of an ABR VC (virtual connection) periodically creates and sends special control cells called RM (resource management) cells which travel through the same path as data cells of the VC all the way to the destination of that VC. The destination then loops these cells back to the source through the same path. When an RM cell is traveling from the source to the destination, it is referred to as a forward RM cell; when it is traveling from the destination to the source, it is referred to as a backward RM cell. FIG. 1 shows an RM cell with all of its fields. FIG. 2 shows those fields which have bearings on the present invention. As seen in FIG. 2, some of these fields are intended for information-sharing only and are thus read-only fields and others are read/write fields which may be modified by intermediate switches and/or the destination.
The source adjusts its ACR (allowed cell rate) based on the feedback carried by returning RM cells. ACR represents the rate the source is using to control its cell transmission for VC. The value of a particular forward RM cell is inserted in that cell's CCR field at the time of its transmission. It is expected that end systems which comply with the source and destination reference behaviors as recommended by the above-cited ATM Forum Specification [1], will experience a low cell loss ratio and obtain a fair share of the available bandwidth. According to one definition, a fair share for a VC will be a function of its MCR (minimum cell rate) that is negotiated during connection setup, as well as the MCRs of the other VCs sharing with it one or more links.
Five different fairness criteria are described in Section I.3 of [1]. The first criterion (called "Max-Min") only applies to the case where all ABR VCs are unweighted (or equally weighted) and with zero MCRs; both restrictions are unrealistic. The third criterion (called "Maximum of MCR or Max-Min share") does not place any direct restriction on MCRs but it requires long iteration time to converge to the equilibrium point. The fourth criterion (called "Allocation proportional to MCR") can not be used if there are ABR VCs with zero MCRs.
The above leaves only two useful fairness criteria for ABR, namely the second and the fifth. The second criterion (called "MCR plus equal share") requires that each active ABR VC get its contracted MCR plus an equal share of the available elastic bandwidth (the latter is obtained after subtracting the MCRs of all active ABR VCs from the available bandwidth). With the fifth criterion (called "Weighted allocation"), the bandwidth allocation for an ABR VC, say VC[vc.sub.-- no], is proportional to its pre-determined weight, w[vc.sub.-- no]. The weight of a given ABR VC may or may not be related to its MCR.
In addition to the PTI (payload type identifier) field which distinguishes between RM and data cells, and the VCI/VPI (virtual connection identifier/virtual path identifier) fields which identify the connection of a cell, important fields of an RM cell in the context of this invention are the following (see FIG. 2).
DIR (direction) bit:
When the source creates an RM cell, it sets DIR=0 indicating that this is a forward RM cell. Before the destination loops back an RM cell, it changes DIR to 1 indicating that this is now a backward RM cell. The DIR bit may not be altered by intermediate switches.
CCR (current cell rate) field:
This is a read-only field that contains the value of ACR at the transmission time of this RM cell.
MCR (minimum cell rate) field:
This is a read-only field that contains the contracted MCR. MCR is a minimum guaranteed rate. The source's rate need never be less than MCR.
ER (explicit rate) field:
This is a read/write field. Before the source transmits an RM cell, it should set this field to the desired rate (typically, the peak cell rate of the connection). An intermediate switch along the connection's path may reduce the value of the ER field in RM cells passing through it to whatever value it can support. However, an intermediate switch may never increase the value of the ER field in an RM cell passing through it.
While ATM Forum specification [1] cited above specifies a reference behavior for an ABR end system (e.g. in terms of the generation and handling of RM cells, adjusting ACR, etc.), the ABR switch behavior is largely unspecified and is left as implementation specific. In particular, the method by which a switch monitors its traffic and updates the ER fields of ABR RM cells passing through it is left as implementation specific.
To understand the objectives of an ABR switch mechanism, it is important to first understand the role of ABR within an ATM network (from a network point of view). In general, it is assumed that ABR will have access to the excess bandwidth left unused by "higher" priority traffic classes, namely, VBR and CBR. The handling of CBR bandwidth is straightforward, since CBR consumes a fixed amount of link bandwidth. However, VBR traffic is bursty by definition which causes the amount of bandwidth available to ABR to fluctuate.
Thus, the function of an ABR switch mechanism is to provide each ABR VC with the "right" rate allocation (inserted in the ER field of ABR RM cells passing through it) with the following two (somewhat conflicting) goals in mind: rapid stabilization to high link utilization and small queue sizes at intermediate switches. Furthermore, the bandwidth available to ABR should be shared in a fair manner among contending ABR VCs. What makes the latter objective non-trivial to achieve is the existence of heterogeneous MCRs and weights (i.e. different ABR VCs may have different MCRs and different weights). For example, assume that two ABR VCs, A and B, share a 100 Mbps link where MCR for VC A is 60 Mbps and MCR for VC B is 0 Mbps. Furthermore, assume that the weight of VC A is 1 while that of VC B is 3. Then a straightforward division of the bandwidth would result in VC A getting 50 Mbps, which is even less than its guaranteed minimum of 60 Mbps. However, the fair share for VC A is 60+(100-60)*(1/4)=70 Mbps and the fair share for VC B is 0+(100-60)*(3/4)=30 Mbps. This example demonstrates how the only two useful ABR fairness criteria of [1] which are #2 (called "MCR plus equal share") and #5 (called "Weighted allocation") can be combined. In this case, each ABR VC gets its MCR plus a weighted share of the available elastic bandwidth.
It should be noted that this invention intentionally ignores the binary mode of operation for ABR, where switches experiencing congestion set a congestion flag in the headers of data cells and/or set special bits in the RM cells. This is because it is well-known that the explicit rate (ER) mode operation for ABR (where switches know what rates they can support and convey these rates to the ABR sources involved) has far superior performance compared to that of the binary mode [8].
None of these existing ABR switch mechanisms takes into account the MCR values of the contending ABR VCs in their ER calculations (not to mention any possible VC weights). They all make the assumption that all ABR VCs have the same MCR, and thus there is no need to take MCR into account. Clearly, in a real network, different ABR VCs may have different MCRs (and different weights) and as such any good ABR switch algorithm should take MCR (and VC weight) explicitly into account. Furthermore, while modifying some of the existing algorithms to account for MCR may be possible, this is not necessarily true for all of them. In particular, modifying ERICA [6] to account for MCR does not seem to be an easy task.
In addition to the above problem of not taking MCR into account, existing ABR switch mechanisms also suffer from what is known as the "CCR-reliance" problem. This can be explained as follows. All existing ABR switch mechanisms read the CCR fields of RM cells passing through them and use these fields in the calculations of the ERs. An implicit assumption that the existing mechanisms make is that the CCR field read from an RM cell on a given VC represents the actual rate of the VC at the moment of reading the field. This is not true for the following reasons:
To avoid the CCR-reliance problems, per ABR VC rate measurements have to be performed. However, this has been avoided in existing ABR switch mechanisms because it was thought that this (i.e. measuring rates per ABR VC) is too complex to implement and may require scanning of ABR VCs to update their measured arrival rates.
EPRCA [2] was the first ER switch mechanism to be proposed. In EPRCA, two congestion states are defined: congested when the queue size exceeds some threshold, and very congested when the queue size exceeds a larger threshold. EPRCA maintains a running weighted exponential average, called MACR, of the CCR fields of all ABR VCs (MACR=MACR*15/16+CCR*1/16). This is done by first intercepting any forward RM cell passing through the link under consideration and reading its CCR field. The newly read CCR is allowed to trigger an update to the MACR average only if the link is congested (as defined above) and CCR&lt;MACR, or if the link is not congested (as defined above) and CCR&gt;MACR*7/8. When a backward RM cell is received and the link is very congested, then ER=min(ER in cell, MACR*1/4). Otherwise, if the link is just congested and CCR&gt;MACR*7/8, then ER=min(ER in cell, MACR*15/16).
It has been shown by many ATM Forum contributions that EPRCA suffers from the following problems: oscillations, link under-utilization, unfairness, and parameter sensitivity. These problems are in part due to the way EPRCA defines congestion (i.e. queue threshold crossing). The problems are expected to become even worse with the introduction of VBR into the network, since EPRCA takes VBR traffic into account indirectly through the queue length measurements.
ERICA [6] and [7], uses a different approach in coming up with the right ER for each ABR VC, and operates with a target link rate in mind. ERICA uses a count based measurement interval: for every N cells received (it is unclear whether this count includes ABR cells only, or all cells), the following link variables are updated:
______________________________________ Nactive = the number of ABR VCs seen in this interval VBR.sub.-- Input.sub.-- Rate = number of VBR cells received/interval duration ABR.sub.-- Capacity = target rate-VBR.sub.-- Input.sub.-- Rate Load.sub.-- Factor = ABR.sub.-- Input.sub.-- Rate/ABR.sub.-- Capacity Fair.sub.-- Share = ABR.sub.-- Capacity/Nactive ______________________________________
Also, the flag Connection.sub.-- Seen is reset for all ABR VCs (this requires scanning for all ABR VC at the end of each measurement interval). The CCR fields are read from forward RM cells and stored, and are then used to update the ER fields of backward RM cells according to the following equations.
______________________________________ ERS = max(Fair.sub.-- Share, CCR/Load.sub.-- Factor) ER = min(ER in cell, ERS) ______________________________________
Furthermore, multiple backward RM cells on the same ABR VCs which are seen during the same measurement interval are given the same ER value; this is done in order to avoid oscillations. This adds two more variables per ABR VC: a flag to indicate that a backward RM cell has been seen on the VC in this measurement interval, and the rate that was last given to each ABR VC. The flag (together with the above-mentioned Connection.sub.-- Seen flag) must be reset for all ABR VCs at the end of each measurement interval.
No independent evaluation of ERICA has been published. However, the present inventors have conducted extensive simulation studies of ERICA under various network scenarios and found that ERICA (at least in its published form) suffers from several serious problems. They include:
Scalability:
Ignoring VBR traffic for now, ERICA is extremely sensitive to the choice of the measurement interval length. If the recommended 30-cell length (for OC-3 links) is used, that means that a maximum of only 30 ABR VCs can be declared as active in any given measurement interval. This would cause the first term in ERICA's main equation (i.e., Fair.sub.-- Share=ABR.sub.-- Capacity/Nactive) to be excessively large, causing problems for the switch since this Fair.sub.-- Share will be provided to all non idle ABR VCs whether they have contributed to Nactive or not. On the other hand, if the length of the measurement interval is chosen to be large enough (say, 10 times the number of VCs which are set up to pass through the link under consideration), then so much activity may be lost and ERICA would be too slow to react to both congestion and under-utilization problems. If the length of the measurement interval is chosen somewhere in between, severe oscillation would occur. Another aspect of scalability concerns the requirement in ERICA to scan all ABR VCs at the end of each measurement interval in order to reset two per VC flags.
Fairness:
It can be shown that, even with MCR=0 (and equal weights) for all ABR VCs, ERICA may not necessarily achieve the max-min fairness criterion in certain network scenarios. For example, it is assumed that the measurement interval length is chosen in such a way that the above-mentioned problems are avoided (it is not certain that this is possible). It is further assumed that a large number of ABR VCs exist on a link where all of them (except, say, for one VC) are rate limited. In this case, the ACRs of the rate limited VCs will be determined through the first term in ERICA's main equation (i.e. Fair.sub.-- Share=ABR.sub.-- Capacity/Nactive). On the other hand, the ACR of the only greedy VC will be determined through the second term in ERICA's main equation (i.e. CCR/Load.sub.-- Factor). Now, if one of the initially rate limited VCs decides that it wants to become greedy and uses all of the bandwidth that it is given, it may not be able to share the link fairly with the VC that initially started as greedy. This happens because the network locks into the wrong set of rate allocations.
On a given ABR VC, the source sends data cells to the destination through one or more intermediate ATM switches. Furthermore, in accordance with the ATM Traffic Management Specification [1] referred to above, the source periodically creates and sends forward RM cells. The aggregate traffic sent by the source is dynamically shaped to ACR (allowed cell rate), which is controlled by the network. The forward RM cells travel through the same path as that of the data cells on the same VC. These forward RM cells are characterized by having their DIR bit set to 0 (see FIG. 2) and carry information about the source that can be useful to the network. In particular, the source inserts its current ACR in the CCR (current cell rate) field of the forward RM cell. Also, it inserts its contracted MCR (minimum cell rate) in the MCR field. Furthermore, it indicates its desired rate through the ER (explicit rate) field; typically, the ER field will be initially set to PCR (peak cell rate) of the connection.
When the destination receives a forward RM cell, it changes the DIR bit to 1 indicating that the cell is now a backward RM cell, reduces the ER field to whatever value it can support, and finally loops the RM cell back to the source. The backward RM cells travel through the same path as that of the data and forward RM cells on the same VC, but in the reverse direction.
In accordance with the ATM Forum Traffic Management Specification [1], intermediate switches have the option of intercepting forward and/or backward RM cells to reduce their ER fields to whatever values they can support as long as fairness is maintained. Thus, when an RM cell finally returns to its originating source, its ER field would reflect the maximum possible rate on the most congested link along the connection's path. The source then adjusts its ACR using:
______________________________________ ACRnew = min(ER in returning RM cell, ACRcurrent + RIF*PCR) ______________________________________
where RIF (rate increase factor) is a parameter that is determined during connection set-up. The source always ensures that the resulting ACR is never below MCR or above PCR. It can be seen that when RIF=1, ACR is totally controlled by the ER values in returning RM cells.
In [9] above, a schematic diagram of ABR flow control algorithm as applied to a single output-buffered ATM switch is described in detail. FIG. 3 is a schematic illustration of a single output-buffered ATM switch where an embodiment of the invention is implemented. It should be noted that the invention itself applies to any ATM switch architecture and is not restricted to output-buffered switches only. The invention typically resides at each queuing point of an ATM switch. Referring to FIG. 3, a switch fabric 40 connects a pair of ports 42, each having a queue 44 at its egress. The data cell flow and RM cell flows associated with one direction of an ABR VC are shown in differing solid lines with an arrow.
In summary, the flow control comprises four processes, namely, a traffic monitoring process, a measurement interval process, a forward RM cell process, and backward RM cell process.
The traffic monitoring process is concerned with monitoring the traffic (ABR and otherwise) being received and dequeued, and updating various cell counts. These cell counts are used by other processes.
In the measurement interval process, the output rate available to ABR is determined and a load factor is updated. This involves steps to determine the ABR input rate, the ABR output rate, and the input and/or output rates of other traffic classes. One aspect of this flow control is that it takes into account riot only ABR input rate, but also ABR output rate. This indirectly takes into consideration the ABR cells already in the queue, especially in situations where ABR cells accumulate in the queue because of, e.g., VBR traffic. Another aspect of the flow control is that, when determining the rate available to ABR, it takes into account both the input and output rates of VBR as well as previous samples of the rate available to ABR.
In the forward RM cell process, forward RM cells are intercepted and their CCR fields are read. The read CCRs are used to update certain link and VC variables. In particular, a moving average of the CCRs seen on each VC is maintained. Furthermore, a moving average of the CCRs seen on all VCs (but without their MCR components) is also maintained; this link variable is called the mean elastic rate. Also, a moving average of the weights of all seen ABR VCs is maintained. Thus, one aspect of the flow control is the averaging of the CCRs seen on a VC. By doing so, certain oscillation problems are avoided without resorting to approaches which require scanning of all ABR VCs at the end of each measurement interval. Another aspect of the flow control is the subtraction of MCR from CCR before triggering an update to the link-wide average of CCRs (i.e. the mean elastic rate). This achieves an MCR-based fairness. A third aspect is the averaging of the weights of seen ABR VCs.
In the backward RM cell process, backward RM cells are intercepted to update their ER fields. One aspect of the flow control is its main equation of calculating the ER for a VC. That equation utilizes the VC's MCR, the VC's weight, the moving average of CCRs seen on this VC, the link's mean elastic rate, and the link's load factor. Another aspect of the flow control is the differentiation between two link states when calculating the ER for a given VC: excessively overloaded (when the load factor is less than some threshold), and otherwise. This differentiation is found to be extremely important to prevent the network from being locked into the wrong set of rate allocations and to allow newly active ABR VCs to acquire their fair share of the bandwidth. Through yet another aspect of the flow control, the rate allocation may be further adjusted based on how much total rate (on all ABR VCs) is allowed as compared to the output rate actually available to ABR; this is controlled through a maximum overbooking factor.
As seen in the above description, this earlier flow control mechanism described in the applicant's U.S. Pat. No. 5,734,530 measures and utilizes rates, both aggregate and per-connection. In summary, according to the described flow control mechanism, the main equation for updating the ER field for ABR connection [vc.sub.-- no] is: ##EQU1## MER is a running exponential weighted moving average of the elastic rates (elastic rate=non-guaranteed rate=actual input rate-MCR) seen on all connections. It is updated with every forward RM cell received. IR[vc.sub.-- no] is the actual input rate measured on the virtual connection VC[vc.sub.-- no]. UF is an underload factor for the link. It is also mentioned that MER does not represent the "real" average of the rates of "active" connections. Rather, it is biased towards larger connections and this bias is believed to improve fairness and helps preventing the network from being locked into the wrong set of rate allocations, which may result in connections which may initially be rate-limited, being not able to acquire their fair share of the bandwidth when they become more active.