The present invention relates generally to data communications networks and more particularly relates to a method for detecting and recovering from signaling congestion in a connection oriented network such as an Asynchronous Transfer Mode (ATM) network.
Currently, there is a growing trend to make Asynchronous Transfer Mode (ATM) networking technology the base of future global communications. ATM has already been adopted as a standard for broadband communications by the International Telecommunications Union (ITU) and by the ATM Forum, a networking industry consortium.
ATM originated as a telecommunication concept defined by the Comite Consulatif International Telegraphique et Telephonique (CCITT), now known as the ITU, and the American National Standards Institute (ANSI) for carrying user traffic on any User to Network Interface (UNI) and to facilitate multimedia networking between high speed devices at multi-megabit data rates. ATM is a method for transferring network traffic, including voice, video and data, at high speed. Using this connection oriented switched networking technology centered around a switch, a great number of virtual connections can be supported by multiple applications through the same physical connection. The switching technology enables bandwidth to be dedicated for each application, overcoming the problems that exist in a shared media networking technology, like Ethernet, Token Ring and Fiber Distributed Data Interface (FDDI). ATM allows different types of physical layer technology to share the same higher layerxe2x80x94the ATM layer.
ATM uses very short, fixed length packets called cells. The first five bytes, called the header, of each cell contain the information necessary to deliver the cell to its destination. The cell header also provides the network with the ability to implement congestion control and traffic management mechanisms. The fixed length cells offer smaller and more predictable switching delays as cell switching is less complex than variable length packet switching and can be accomplished in hardware for many cells in parallel. The cell format also allows for multi-protocol transmissions. Since ATM is protocol transparent, the various protocols can be transported at the same time. With ATM, phone, fax, video, data and other information can be transported simultaneously.
ATM is a connection oriented transport service. To access the ATM network, a station requests a virtual circuit between itself and other end stations, using the signaling protocol to the ATM switch. ATM provides the User Network Interface (UNI) which is typically used to interconnect an ATM user with an ATM switch that is managed as part of the same network.
Networks that are connection oriented typically have two stages for connecting network users from point to point. The first stage in the establishment of the connection utilizes some form of signaling mechanism and in the second stage, data is transferred via the connection established in the first stage.
An example of such as connection oriented network is an ATM network. In the first stage, virtual connections are created using a complicated signaling/routing protocol such as Q.SAAL, Q.93, IISP, and/or PNNI between peer network nodes along the connection path to provide network users a service for establishing a connection to another network user. This connection is termed a Switched Virtual Connection (SVC) and, once created, is used as the data path between the users that have been connected.
The connection originator uses the signaling protocol to convey the service details it is requesting the network to provide, e.g., destination address (the called address), class of service (CoS), traffic descriptor, protocol which is to used by the virtual connection, network transit, etc. In addition, the originator provides information about itself, in particular, its own address (the calling address).
Once the network receives the request from the originator user, it attempts to find a route to the destination that has sufficient resources to fulfill the specific characteristic requirements of the request as provided by the originating user. If the network finds a satisfactory route with the necessary resources to establish the connection, and if the called user also has sufficient resources to establish the connection, the connection is then established. Once the route is established, data can flow between source and destination over the connection.
Such a network may carry another type of connection known as a Permnanent Virtual Circuit (PVC) which are typically established under manual management control. The service provided by PVCs and SVCs are the same, with the different being their method of establishment.
The signaling/routing protocol used typically consumes a high percentage of computation resources in a node. This makes the connection establishment process slow. PVCs, as an alternative to SVCs are set via management in a manual fashion on each network node along the path. The PVC connections are typically stored in the system memory within the nodes making up the connection and are recreated in the event one or more portions of the connection go down. The connections are recreated and restored automatically, quickly and without the overhead of the signaling and routing protocol.
In the course of network operations, SVCs may be constantly created and torn down. SVC connections may be created very quickly and last for a relatively short lifetime duration, i.e., on the order of hundreds of milliseconds, seconds, etc., before being removed. In many networks today SVCs serve to connect well known services located in the network to well known clients also connected to the network. These connections are utilized as permanent connections, as they are established and may not be taken down for days, weeks, or months. In many cases, SVCs are established on a permanent basis, whereby they are never taken down and remain up until the occurrence of a network failure.
A block diagram illustrating an example ATM network comprising a plurality of switches serving to connect a source and destination end station is shown in FIG. 1. The example network, generally referenced 10, comprises an ATM network 24 consisting of end stations 12 labeled end station A and B, edge devices 14 labeled edge device A and B and a plurality of ATM switches 16 labeled ATM switch #1 through #5.
As described previously, in ATM networks, signaling is used as the main method of creating and terminating VCC connections. The connections created are used as the infrastructure to applications located at the higher layers. Examples of higher layer applications include LANE, MPOA, etc.
A block diagram illustrating a call control software/hardware application within an ATM switch and the plurality of signaling entities established and operative under its control is shown in FIG. 2.
With reference to FIGS. 1 and 2, the call control model shown, generally referenced 30, is used for signaling in ATM switches wherein each switch comprises N ports (input and output). The call control entity 32 is shown communicating with a plurality of signaling entities 34 labeled signaling entity #1 through signaling entity #N. Each signaling entity 34 functions to establish, terminate and maintain SVCCs using standards based interface signaling specifications such as UNI v3.0 or 4.0, PNNI signaling, etc.
The call control entity 32 functions to provide routing, bandwidth management and hardware programming services to the SVCCs. A key assumption made by the switch, however, is that the signaling is a reliable service. In other words, when a signaling Protocol Data Unit (PDU) is generated by the upper signaling application layer and passed to lower layers for transmission, it is assumed that the PDU was successfully transmitted to the destination via the network. The signaling entity represents a state machine at an upper layer, i.e., layer 3, which functions to create and terminate connections. A layer 2 application functions as a data link layer and provides services to the Layer 3 signaling above in a reliable manner.
In normal operation of the switch, the data link layer restricts the rate of transmission of signaling PDUs over each using a sliding or moving window transmission technique, a technique that is well known in the communication arts. The function of the sliding window transmission technique is to ensure that the transmitter does not overflow the receiver. Windowing involves limiting the number of packets/messages/PDUs that can be transmitted before an acknowledgement is received from the receiver. Receipt of acknowledgements cause the window to move or slide thus permitting additional messages to be transmitted.
In certain cases, however, large volumes of signaling traffic may be routed towards a particular egress link. This may happen, for example, when a network comprises hundreds of LECs which, upon powerup of the network, all attempt to connect at the same time to the LECS. In such cases of high volumes of signaling traffic, a congestion state starts to develop wherein signaling messages (PDUs) that are outstanding, i.e., that have not been sent, begin to be held in internal transmitter queues.
If, however, the transmitter continues to remain in the congested state for a long period of time, the switch will eventually reach a starvation point whereby not enough buffers are available for signaling. In a typical switch, a large pool of memory is provided that is used by the controller to carry out the various tasks and functions of the switch. A portion of this memory pool is designated for use as buffers for signaling messages.
While in the congestion state, the transmitter cannot transmit messages and thus places them in signaling message buffers assigned from the memory pool allotted to signaling messages. If the transmitter remains in the congestion state, the supply of signaling buffers declines. Since the number of available signaling message buffers is always limited (regardless of how much memory the switch has), a point is eventually reached whereby no free signaling message buffers are available.
From this point on, the switch begins dropping signaling messages (PDUs) which results in severe problems, namely what is termed xe2x80x98brokenxe2x80x99 calls. A broken call is a call that was not terminated properly in accordance with any standard, e.g., UNI, PNNI, etc., due to a loss of the RELEASE PDU message somewhere in the network. In most cases, this problem is not recoverable within the scope of signaling and typically causes severe problems for the higher layer applications. Thus, the dropping of signaling PDUs by the transmitter violates the assumption of reliable transmission that the upper layers in the hierarchy rely on.
What is needed, therefore, is a means within the switch for first detecting the existence of a congestion state and second for recovering and handling the congestion state situation.
The present invention is method of detecting a signaling congestion situation in a transmitter within a switch and for handling and recovering from the signaling congestion. The invention also comprises a method for detecting the absence of a signaling congestion situation and the processing thereof. The invention is applicable to ATM switching networks wherein a sliding window technique is used in transmitting signaling or any other type of messages from a source to a destination. The invention, however, is not limited to application only to ATM networks. It is applicable to any type of communications system whereby a siding window technique is used to transmit data from one point to another.
The method of the present invention functions to (1) monitor the level of the transmit queue for each port and (2) to monitor the level of the signaling message buffer memory pool. When either level passes predetermined thresholds, the signaling congestion state is declared. The thresholds used to determine whether a port is in the signaling congestion state are based on the size of the signaling sliding window and the number of ports within the communication device (e.g., switch).
Once the signaling congestion state is declared, the call control or equivalent entity in the communications device, e.g., the switch, stops routing new calls from/towards all ports that are in the signaling congestion state. The call control continues, however, to handle existing calls from/towards ports that are in the signaling congestion state.
Not only does the method of the present invention provide a solution to the broken call phenomena, but it also enables more efficient management of the signaling memory buffers which results in reduced memory consumption by the switch. Thus, switches employing the method of the present invention, for the same call connection throughput and load, require less memory for signaling message buffers than those switches not employing the method.
There is thus provided in accordance with the present invention, in a communication system network including a plurality of communication devices each having one or more transmitters and receivers, each transmitter having an output port and signaling transmitter queue associated therewith, the communication system also including a memory buffer pool shared by a plurality of output ports, a method of detecting on an output port basis the existence of and recovering from a congestion state, the method comprising the steps of monitoring the current length of the transmit queue, monitoring the current ratio of free buffer space available in the memory buffer pool, declaring an output port to be in the congestion state upon the length of the signaling transmit queue exceeding a first threshold or upon the ratio of available memory buffer pool space dropping below a second threshold and ceasing to route new calls from and towards an output port that is in a congestion state.
The method further comprises the step of continuing to handle already existing calls from and towards output port in a congestion state. The first threshold comprises an upper transmit queue threshold forming part of a hysteresis mechanism for preventing oscillation into and out of the congestion state. More particularly, the method according to claim 1, wherein the first threshold comprises an upper transmit queue threshold equal to (Nxe2x88x921)xc2x7window_size, wherein N is the number of ports on the communications device and window_size is the size of a sliding window mechanism used in transmitting data from the output port.
The second threshold comprises a lower memory buffer pool threshold forming part of a hysteresis mechanism for preventing oscillation into and out of the congestion state. For example, the second threshold may comprise a lower memory buffer pool threshold equal to 15%. The method further comprises the step of taking an output port out of the congestion state when the signaling transmit queue length drops below a third threshold and the ratio of available buffer space in the memory buffer pool exceeds a fourth threshold. The third threshold may comprise a lower transmit queue threshold forming part of a hysteresis mechanism for preventing oscillation into and out of the congestion state. More particularly, the third threshold comprises a lower transmit queue threshold equal to             (                        N          -          1                2            )        ·    window_size    ,
wherein N is the number of ports on the communications device and window_size is the size of a sliding window mechanism used in transmitting data from the output port. The fourth threshold comprises an upper memory buffer pool threshold forming part of a hysteresis mechanism for preventing oscillation into and out of the congestion state. For example, the fourth threshold comprises an upper memory buffer pool threshold equal to 25%.
The method further comprises the steps of attempting to route a new call to an output port not in the congestion state and rejecting a call which cannot be routed to an alternative output port not in the congestion state utilizing a unique RELEASE CAUSE operative to notify the rest of the network that the call was rejected due to a temporary congestion state on the output port.
There is further provided in accordance with the present invention, in a communication system including a plurality of communication devices each having one or more transmitters and receivers, each transmitter having an output port and signaling transmitter queue associated therewith, the communication system also including a memory buffer pool shared by a plurality of output ports, a method of taking an output port currently in the congestion state, out of the congestion state, the method comprising the steps of monitoring the current length of the signaling transmit queue, monitoring the current ratio of free buffer space available in the memory buffer pool, taking an output port out of the congestion state when the signaling transmit queue length drops below a first threshold and the ratio of available buffer space in the memory buffer pool exceeds a second threshold and resuming the routing of calls from and towards the output port upon its removal from the congestion state.
There is also provided in accordance with the present invention an apparatus for detecting and recovering from a congestion state for use in a communications device, the communications device coupled to a network, the apparatus comprising one or more transmitters each having an output port and a signaling transmit queue associated therewith, a memory buffer pool shared by a plurality of output ports, one or more signaling entities operative to establish, terminate and maintain one or more Switched Virtual Channel Connections (SVCCs), a call control entity for configuring, administering and controlling the one or more signaling entities, software means operative on the one or more signaling entities and the call control entity for: monitoring the current length of each signaling transmit queue, monitoring the current ratio of free buffer space available in the memory buffer pool, declaring an output port to be in the congestion state upon the length of the signaling transmit queue exceeding a first threshold or upon the ratio of available memory buffer pool space dropping below a second threshold and ceasing to route new calls from and towards an output port that is in a congestion state.