This invention is in the field of digital asynchronous transfer mode (ATM) networking. It is related to network capability and network utilization and is of a switching method for realizing a specific capability.
One class (Class I) of ATM connections, already known through the Recommendations of the International Telecommunications Union (ITU), in relation to the Broadband Integrated Services Digital Network, and the Specifications of the ATM Forum, is that for which specific per connection resource provisions, in terms of bandwidth and switch-buffer allocation, are made and on which delivery is assured in occurrence and in specified or lesser time. Bandwidth or rate descriptors already recommended for Class I connections by the ITU or ATM Forum are Peak Cell Rate and Sustained Cell rate. The other class (Class II), brought into possible existence by our invention, is of ATM connections for which no specific, per connection, resource provisions are made, on which delivery is assured in occurrence and assurance on time of delivery is in a statistical sense only, and which are subject to control, originating within the switching element of the switch, which control is based only on the aggregate switch buffer utilization of the Class""s traffic.
An example of an ATM switch which can support Class I connections is the Prelude switch of France Telecom CNET (e.g. J.-P. Coudreuse and M. Servel xe2x80x9cPreludexe2x80x94an asynchronous time-division switched networkxe2x80x9d. ICC""87, Seattle, 1987; French Patent no 82 22 226, Dec. 29, 1982 xe2x80x9cSysteme de Commutation de paquets synchrones de longeur fixexe2x80x9d, Publication no 2 538 976).
The Prelude is a 16 port switch, switching at 260 Mbps rate. Writing from each input port is effectively in turn, with cells being written into consecutive locations of a (circular) buffer. The pointer to each written-in cell is placed into the FIFO stack(s) of the one (or more) output(s) that is (are) meant to receive it. The reading is also effectively in turn by each output port, taking the next cell pointer from its FIFO stack and outputting the pointed cell. The fill of a FIFO stack at a given time represents the backlog of cells still to be read by the given output port. The capacity of a FIFO stack could typically be for, say, 50 pointers. To accommodate the worst case, the capacity of the (circular) buffer would be for 50xc3x9716=800 cells.
The Prelude switch can switch only xe2x80x98resourcedxe2x80x99 connections, or what we here call Class I ATM connections. The connections are resourced in the sense that all connections Cijk, where subscript i signifies that the connection enters on input port i, subscript j signifies that it is to be switched to output port j, and k signifies that it is the k-th connection that goes from i to j, have individual peak cell rate limits rijk such that                                           ∑            i                    ⁢                                    ∑              k                        ⁢                          r              ijk                                       less than                               R            j                    ⁡                      (                          1              -              m                        )                                              (                  EQ          ⁢                      xe2x80x83                    ⁢          1                )            
where Rj is the maximum rate at which cells can be output by port j, and m is a safety margin. The size of the safety margin depends on the number of connections, the capacity of the FIFO stack, and the magnitude of the tolerable probability that the stack should overflow. A typical size of margin would be 0.15, corresponding to a tolerable probability of overflow of less than 10xe2x88x9213.
We note that with service for Class I ATM connections only, the total network capacity is per force underutilized, minimally by the extent of the safety margin. A potentially much larger underutilization will be associated with the observation that a large proportion of the connections will not at all times, or continually send cells into the network at their given peak rates rijk, but instead exhibit random intervals of inactivity. The average cell flow xcexcijk on Cijk will in a significant percentage of cases be much smaller than rijk, and in consequence                                           ∑            i                    ⁢                                    ∑              k                        ⁢                          μ              ijk                                       less than                               ∑            i                    ⁢                                    ∑              k                        ⁢                          r              ijk                                       less than                               R            j                    ⁡                      (                          1              -              m                        )                                              (                  EQ          ⁢                      xe2x80x83                    ⁢          2                )            
All the capacity that cannot be, or is not utilized, is wasted. With a measure of foresight, it has been called xe2x80x98available capacityxe2x80x99. As explained in greater detail below, an embodiment of the present invention has the effect of making the xe2x80x98availablexe2x80x99 capacity in fact available.
It is an object of the present invention to provide a method of switching in an ATM network which, when used in conjunction with known methods of dynamic flow control on the network links, gives the network the capability of supporting two classes of ATM connections.
The dynamic flow control on the network links applies only to traffic cells on Class II connections and is credit based. Within Class II, control is exercised on the whole class of cells rather than on a per connection basis within the class. Class II cells may only be transmitted for which transmit credits are available. Credit(s) is (are) returned to the transmit end of the link, to replenish those available at initialization, whenever a Class II cell is read into the switching element of the switch from the Input Port Controller (IPC) Board. Return of credits is made possible by the assumption that the links comprise two-way pairs of unidirectional elements with one element in each direction.
An embodiment of the switching method, by the switching element of the switch, is by immediate transfer to a common shared memory of every traffic cell offered to the switching element by the IPC Board, and appendance of a pointer (pointing to that cell), to the Class I or Class II queue (as appropriate) for the output port(s) for which the cell is destined. These queues form part of the switching element. At a read-out opportunity for an output port, pointers are read from the Class I queue in first-in-first-out (FIFO) order at link rate, they are read from the Class II queue also in FIFO order and at link rate, but only while the Class I queue is empty and the Output Port Controller (OPC) will accept a Class II cell. A signal from the OPC to the switching element, indicating it cannot accept a Class II cell, implies that the output port has exhausted its supply of credits. Given a pointer, the cell is read from the pointed location, and output to the OPC Board.
When a cell has been read by an output, or in the case of a multicast cell when it has been read by all concerned outputs, its location is returned to the pool of free shared memory locations. With each input port there is associated a Backlog number, being the number of Class I cells resident in the shared memory which were received into the switch by that input port. While that Backlog number is at or above the quota value for that input, the switching element sends a Stop signal to the IPC Board in every cell period, by which the IPC controls the transferral of Class II cells from the IPC Board to the switching element.
Alternative embodiments of the invention existxe2x80x94one such embodiment is similar to the embodiment described above, but lacks a common shared memory within the switching element of the switch. Instead, cells which are accepted by the switching element are transferred physically to the appropriate FIFO output queues, which hence are queues of cells, rather than of pointers. In this alterative embodiment the Backlog number associated with an input port is the total number of cells in the Class II FIFO output queues which were received into the switch by that input port Whether all copies or only one copy of any multicast cells are counted in the Backlog number is optional.
The maintenance of two classes of switch queues, with absolute service priority for Class L at output ports, and the keeping of Backlog number and quota value for input ports are of key significance to the embodiment of the invention. Separation of queues makes service and service quality on Class I connections independent of traffic on Class II connections, and queueing specifically at output makes the service quality the best possible for any given traffic shape and intensity and the service capacity available to Class II the maximum possible. Application of Stop control on the basis of traffic backlog makes it possible to prevent loss by buffer overflow. Further, making the Stop control for each input port independent, and with a limit on backlog associated with that input, allows for fair sharing among all inputs of the available Class II service capacity. The invention provides a per-class control which offers per-connection fairness.
At an abstract level, the method may be viewed as embodying some principles in common with the distributed queueing protocol (e.g. described by John L. Hullett and Robert M. Newman in xe2x80x9cQueueing Protocolxe2x80x9d. U.S. Pat. No. 4,922,244, May 1, 1990) and with staged queueing (e.g. described by Zigmantas L. Budrikls, Antonio Cantoni, and John L. Hullett in xe2x80x9cDistributed Queue Dual Tree Digital Networkxe2x80x9d (DQDT), AIPO PCT/AU94/00XXX, Jun. 30, 1994). Whereas in these antecedents service is in each case by an implicit single server, in the method described here it is by explicit multiple dedicated servers. But similar to the antecedents, the service is regular FIFO with limited participation in the queues, and backpressure control ensuring loss-free, minimum latency transfer.
Also similar to principles in the extended queue protocol described by Zigmantas L. Budrikis et al. (loc. cit.), the service may in all instances be of two queues at different priorities, where participation in the queues of the higher priority is limited by negotiated peak rates, ex source, on individual ATM connections, and is not subject to the backpressure control. With the extended queue discipline, transfers at the higher priority have guarantees of bandwidth and of limit on delay.
The shared memory ATM switch that, in concert with other switches with identical function and terminals with analogous control functionality, can implement the method of our invention, will have a memory that is notionally partitioned into separate areas (quotas), one area for each input port It will have two FIFO queues associated with each output port, a FIFO queue for Class I and a FIFO queue for Class II. Cell write-in opportunities will occur periodically for all inputs, and cell read-out opportunities will occur periodically for all outputs. The ATM level protocol information will include a Stop_bit, by which flow control will be exercised on a luck-by-link basis.
The backpressure control applies to the aggregated traffic on Class II connections into each input port, based on obtained service from outputs for its onward disposal. Let the traffic intensity on Class II connections arriving at input port i for transmission tort port j be xcfx81ij. The total Class II traffic at input port i is             ∑      j        ⁢          ρ      ij        ,
while the total traffic offered to output port j is       ∑    i    ⁢            ρ      ij        .  
Assuming sufficiently large quotas and statistically stable traffic flows, the switch would forward all traffic without exercising Stop control at any of the inputs if, and only if, for all j                                                         ∑              i                        ⁢                          ρ              ij                                 less than                       (                                          R                j                            -                                                ∑                  i                                ⁢                                                      ∑                    k                                    ⁢                                      μ                    ijk                                                                        )                          =                  A          j                                    (                  EQ          ⁢                      xe2x80x83                    ⁢          3                )            
where Aj is the available capacity at output port j.
In reality it must be expected that the inequality (EQ 3) will not hold for all outputs, and that Stop control will be exerted at one or more input ports. The effect of the control is to modify the aggregate traffic on Class II connections into input port i by multiplication by a factor gi, 0 less than gixe2x89xa61, so that for all j                                           ∑            i                    ⁢                                    g              i                        ⁢                          ρ              ij                                       less than                   A          j                                    (                  EQ          ⁢                      xe2x80x83                    ⁢          4                )            
For the special case when the traffic intensities and available capacities are constants, that is all flows are at constant bit rates and the available capacities are fixed over an extended duration of time, the factors gi can be found simply from equilibrium conditions. For instance, consider the case where up to time t=0 EQ 3 held for all j, and at t=0 one or more of the traffic flows or of the available capacities changed to new constants so that the inequalities of EQ 3 still holds for all but one particular output j0. Then the aggregate flows would potentially be down-sized only in those inputs that send any traffic to j0, and of these, assuming that all inputs have equal and adequate quotas, only those that contribute more than their fair share of the traffic to j0.
Pursuing the example further, the input ports are ranked in ascending order with respect to their traffic to output j0. The total of n inputs are divided into two groups, the xe2x80x9cunder fair-sharexe2x80x9d group of m inputs for which                                           ∑                          i              =              1                        m                    ⁢                      ρ                          ij              o                                      ≤                              A                          j              o                                -                                    (                              n                -                m                            )                        ⁢                          ρ                              mj                o                                                                        (                  EQ          ⁢                      xe2x80x83                    ⁢          5                )                                          and          ⁢                      xe2x80x83                    ⁢                                    ∑                              i                =                1                                            m                +                1                                      ⁢                          ρ                              ij                o                                                     greater than                               A                          j              o                                -                                    (                              n                -                m                -                1                            )                        ⁢                          ρ                                                (                                      m                    +                    1                                    )                                ⁢                                  j                  o                                                                                        (                  EQ          ⁢                      xe2x80x83                    ⁢          6                )            
and the xe2x80x9cover fair-sharexe2x80x9d group of the remaining (highly ranked) nxe2x88x92m inputs. Then, dictated by equilibrium,                               g          i                =                  {                                                    1                                                                                  i                    =                    1                                    ,                  2                  ,                  m                                                                                                                                                A                      j                                        -                                                                  ∑                                                  i                          =                          1                                                m                                            ⁢                                              ρ                        ij                                                                                                                        (                                              n                        -                        m                                            )                                        ⁢                                          ρ                      ij                                                                                                                                        i                    =                                          m                      +                      1                                                        ,                                      …                    ⁢                                          xe2x80x83                                        ⁢                    n                                                                                                          (                  EQ          ⁢                      xe2x80x83                    ⁢          7                )            
where the dummy variable t has been used in place of i as the first subscript in the summation, and still designates the input port.
Consider a numerical example of a 5xc3x975 switch, designated as switch S, and having Aj=10, for all j, and a stable traffic matrix for t less than 0            ρ      ij        ⁡          (              t         less than         0            )        =      [                            0                          2                          2                          1                          3                                      1                          0                          2                          2                          0                                      1                          2                          0                          1                          1                                      3                          1                          3                          0                          1                                      2                          1                          2                          3                          0                      ]  
An increase of the offered traffic element xcfx8123 from 2 to 6 at t=0 will results in the unstable traffic matrix             ρ      ij        ⁡          (      0      )        =            [                                    0                                2                                2                                1                                3                                                1                                0                                6                                2                                0                                                1                                2                                0                                1                                1                                                3                                1                                3                                0                                1                                                2                                1                                2                                3                                0                              ]        .  
that at initial equilibrium at the switch produces g2=0.5, and the matrix             ρ      ij        ⁡          (              t         less than         0            )        =      [                            0                          2                          2                          1                          3                                      0.5                          0                          3                          1                          0                                      1                          2                          0                          1                          1                                      3                          1                          3                          0                          1                                      2                          1                          2                          3                          0                      ]  
If the link terminating in port 2 in the above example is to another switch, say switch T, then, by the reduction of traffic on the link to port 2 of Switch S, the traffic at switch T is made unstable and subsequently is made stable by similar reduction of the major input tributary or tributaries to that output link. The process of flow reduction by Stop control continues backward through the switches until it reaches the primary source or sources of the excess traffic, originating in one or more terminals. If all primary traffic flows, except the primary flow(s) reduced by the Stop control that propagates from switch S, remain at the same values as at t=0, then with high probability the traffic flows into input port 2 of S that go to other than output port 3 are restored in time to their values at t=0.
The final stable traffic matrix at switch S is with high probability             ρ      ij        ⁡          (              t        →        ∞            )        =      [                            0                          2                          2                          1                          3                                      1                          0                          3                          2                          0                                      1                          2                          0                          1                          1                                      3                          1                          3                          0                          1                                      2                          1                          2                          3                          0                      ]  
While formally the final stable traffic matrix is indicated as occurring in the limit as time approaches infinity, the actual elapsed time before this state is reached can be quite short. It will depend on the size of the quotas, the size of the excess traffic, the number of switching stages back to the primary sources, and the physical lengths of involved links. The time to final stability will be shorter in direct proportion, the larger the excess traffic. But it will be made longer, also in direct proportion, the larger the quotas, the greater the number of switching stages, and the larger the physical lengths of involved links.
Nothing or very little can be done about physical lengths or the number of switching stages. Quota size is the only design parameter. To achieve rapid response, it should be small: to achieve maximum fairness in service to inputs, it should be large. Ultimately, both considerations are related to performance, and the quotas should be chosen so as to achieve optimum performance. To do this, a better understanding of Class II traffic characteristics will be required than is now available. As yet only indicative pointers can be made.
Differences in the values assigned as quotas for the input ports on a particular switch are significant as a factor in the sharing among input ports of the available forwarding capacity, and may be used to this end. If all input ports have equal quotas, the sharing in available capacity will, at least approximately, be also equal. If one input port is allocated a larger quota than others, then it will, again approximately, claim a proportionally larger percentage of available capacity. The larger the memories, in absolute terms, the closer these approximations, but at the same time the less the dynamic agility in the control to push back the necessary rate adjustments to the originating sources and thereby maximize the total network throughput.
The consideration of fair apportionment among inputs of available service by outputs, is similarly problematical. Fair apportionment, coupled to rapid push-back of control to primary sources, will give the best utilization of available capacity. For the special case of traffic flows composed of constant rate tributaries, only little memory is required to ensure fair apportionment. An amount, in numbers of cells, of the order of n2, where n is the number of ports on the switch, will be more than sufficient. However, in practice the offered traffic on Class II ATM connections can rarely be expected to be constant; more credibly it must be expected to be random in time and without any stationarity even in a statistical sense. Very large, equally limited memories would in all circumstances ensure fair apportionment over time, but the large memories will make the push-back of control to primary sources slow.