The invention relates to a method and system for buffering data packets at a queuing point in a digital communications device such as a network node.
In order to effect statistical multiplexing in a store and forward digital communications device, such devices will typically queue data packets for subsequent processing or transmission in a common storage resource such as a memory buffer. At such a queuing point, the common storage resource may be shared by traffic stream associated with various quality of service classes, interface ports, and aggregate pools or groups of traffic flows. With traffic of such a multi-faceted nature, such communication devices will often employ some type of congestion control system in order to ensure that the common storage resource is xe2x80x9cfairlyxe2x80x9d allocated amongst the various traffic streams.
For example, in an a synchronous transfer mode (ATM) communication system, the most elemental traffic stream is a virtual connection (VC) which may belong to one of a number of different types of quality of service categories. The ATM Forum Traffic Management working group has defined five (5) traffic classes or service categories, which are distinguished by the parameter sets which describe source behaviour and quality of service (QoS) guarantees. These categories include constant bit rate (CBR), real time variable bit rate (rtVBR), non-real time variable bit rate (nrtVBR), available bit rate (ABR), and unspecified bit rate (UBR) service categories. The ABR and UBR service categories are intended to carry data traffic which has no specific cell loss or delay guarantees. UBR service does not specify traffic related guarantees while ABR services attempts to provide a minimum useable bandwidth, designated as a minimum cell rate (MCR). The ATM Forum Traffic Management working group and International Telecommunications Union (ITU) have also proposed a new service category, referred to as guaranteed frame rate (GFR). GFR is intended to provide service similar to UBR but with a guaranteed minimum useable bandwidth at the frame level, which is mapped to the cell level by an MCR guarantee.
In an ATM device such as a network switch the memory buffer at any given queuing point may be organized into a plural number of queues which may hold data packets in aggregate for VCs associated with one of the service categories. Alternatively, each queue maybe dedicated to a particular VC. Regardless of the queuing structure, each VC represents a traffic flow and groups of VCs, spanning one or more queues, can be considered as xe2x80x9ctraffic flow setsxe2x80x9d. For instance, VCs associated with a particular service class or input/output port represent a traffic flow set. When the memory buffer becomes congested, it may be desirable to apportion its use amongst service categories, and amongst various traffic flow sets and the individual traffic flows thereof. In particular, in a network where GFR and ABR connections are contending for buffer space, it may be desired to achieve a fair distribution of the memory buffer between these service categories and between the individual traffic flows or groups thereof.
There are a number of prior art fair buffer allocation (FBA) schemes. One scheme for fairly allocating buffer space Is to selectively discard packets based on policing, For an example of this scheme in an ATM environment, a packet is tagged (i.e., its CLP field is set to 1) if the corresponding connection exceeds its MCR, and when congestion occurs, discard priority is given to packets having a cell loss priority (CLP) field set to zero over packets having a CLP field set to one. See ATM Forum Technical Committee, (Traffic Management working group living list)xe2x80x9d, ATM Forum, btd-tm-01.02, July 1998. This scheme, however, fails to fairly distribute unused buffer space between connections.
Another scheme is based on multiple buffer fill level thresholds where a shared buffer is partitioned with these thresholds. In this scheme, packet discard occurs when the queue occupancy crosses one of the thresholds and the connection has exceeded its fair share of the buffer. The fair buffer share of a connection is calculated based on the MCR value of the connection and the sum of the MCRs of all active connections utilizing the shared buffer. However, this technique does not provide an MCR proportional share of the buffer because idle (i.e., allocated but not used) buffer, which can be defined as             ∑              i        =        1            N        ⁢          xe2x80x83        ⁢          max      (              0        ,                                                            MCR                i                                                              ∑                  active                                ⁢                                  xe2x80x83                                ⁢                MCR                                      ⁢                          Q              s                                -                      Q            i                              )        ,
where Qs is the buffer fill level, Qi is the buffer segment count for a connection i, and             MCR      i                      ∑        active            ⁢              xe2x80x83            ⁢      MCR        ⁢      Q    s  
is the fair share of buffer allocated to the connection, is distributed at random between the connections.
Another scheme for fairly allocating buffer space through selective discard is based on dynamic per-VC thresholds. See Choudhury, A. K., and Hahne, E. L., xe2x80x9cDynamic Queue Length Threshold in a Shared Memory ATM Switchxe2x80x9d, Proceedings of I. E. E. E. Infocom 96, March 1996, pages 679 to 686. In this scheme the threshold associated with each VC is periodically upgraded based on the unused buffer space and the MCR value of a connection Packet discard occurs when the VC occupancy is greater, than the VC threshold. This method reserves buffer space to prevent overflows. The amount of reserved buffer space depends on the number of active connections. When there is only one active connection, the buffer is not fully utilized, i.e., full buffer sharing is not allowed.
The above-mentioned prior art does not fairly distribute unused buffer space between connections or traffic flow groups, and in particular does not provide MCR proportional fair share distribution of the buffer. Some prior art FBA schemes also do not allow for full buffer sharing. Another drawback with some prior art FBA schemes is the fact that the issue of multiple traffic flow groups contending for the same buffer resource is not addressed. The invention seeks to overcome or alleviate some or all of these and other prior art limitations.
One aspect of the invention relates to a method of partitioning a memory buffer. The method involves defining a hierarchy of memory partitions, including at least E top level and a bottom level, wherein each non-bottom level memory partition consists of one, or more child memory partitions. The size of each top-level memory partition is provisioned, and a nominal partition size for the child partitions of a given non-bottom level memory partition is dynamically computed based on the congestion of the given memory partition. The size of each child memory partition is dynamically computed as a weighted amount of its nominal partition size. These steps are iterated in order to dynamically determine the size of each memory partition at each level of the hierarchy. The memory partitions at the bottom-most level of the hierarchy represent space allocated to (individual or aggregate) traffic flows, and the size of each bottom-level partition represents a memory occupancy threshold for the traffic flow.
The memory partitions are preferably xe2x80x9csoftxe2x80x9d as opposed to xe2x80x9chardxe2x80x9d partitions in that if the memory space occupied by packets associated with a given partition exceeds the size of the partition then incoming packets associated with that partition are not automatically discarded, In the embodiments described herein, each memory partition represents buffer space allocated to a group or set of one or more traffic flows at various levels of granularity. For instance, a third level memory partition may be provisioned in respect of all traffic flows associated with a particular egress port, and a second level memory partition may be associated with a subset of those traffic flows which belong to a particular service category. Therefore, the size of a given partition can be viewed as a target memory occupancy size for the group of traffic flows corresponding to the given partition. At the lowest level of the hierarchy, however, the partition size functions as a threshold on the amount of memory that may be occupied by a (individual or aggregate) traffic flow. When this threshold is exceeded, packet discard is enabled. In this manner, aggregate congestion at higher levels percolate down through the hierarchy to effect the memory occupancy thresholds of individual traffic flows. The net result is the fair distribution of unused buffer space between groups of traffic flows and the individual members thereof.
Another aspect of the invention relates to a method of buffering data packets. The method involves: (a) defining a hierarchy of traffic flow sets, including at least a top level and a bottom level, wherein each non-bottom level traffic flow set comprises one or more child traffic flow subsets; (b) provisioning a target memory occupancy size for each top-level traffic flow set; (c) dynamically determining a target memory occupancy size for each traffic flow set having a parent traffic flow set based on a congestion of the parent traffic flow set; (d) measuring the actual amount of memory occupied by the packets associated with each bottom level traffic flow; and (e) enabling the discard of packets associated with a given bottom level traffic flow set in the event the actual memory occupancy size of the corresponding bottom level traffic flow exceeds the target memory occupancy size thereof.
In the embodiments described herein, the target memory occupancy size for a given traffic flow set is preferably computed by first computing a nominal target occupancy size for the child traffic flow sets of a common parent. The target memory occupancy size for each such child traffic flow is then set to a weighted amount of the nominal target occupancy size, The nominal target occupancy size for a given group of child traffic flow sets preferably changes in accordance with a prespecified function in response to the congestion of their common parent traffic flow set. For instance, the embodiments described herein deploy geometric and decaying exponential functions for computing the nominal target occupancy based on the congestion of a disparity between the target and measured memory occupancy sizes of a traffic flow set.
In the disclosed embodiments the invention is implemented within the context of an-ATM communications system. In these embodiments, the comparison specified in step (f) is carried out prior to or upon reception of the first cell of an ATM adaption layer 5 (AAL5) frame in order to effect early packet discard in accordance with the outcome of the comparison.
The buffering system of the invention scales well to large systems employing many hierarchical levels. This is because there are relatively few state variables associated with each hierarchical level. In addition, most computations may be performed in the background and lookup tables may be used, thereby minimizing processing requirements on time critical packet arrival. The buffering system also enables full buffer sharing, as discussed by way of an example in greater detail below.