The present invention relates to Access Media Gateways (AGWs). More particularly, and not by way of limitation, the present invention is directed to a system and method for overload control between AGWs and the associated Media Gateway Controllers (MGC) in Next Generation Networks (NGNs).
Abbreviations and Definitions
a. AACC Adaptive Automatic Congestion Control
b. AGW Access Media Gateway
c. CS Call Server
d. GOS Grade of Service
e. ISUP ISDN User Part
f. MSAN Multi Service Access Node (equivalent to and interchangeable with AGW)
g. MGC Media Gateway Controller (equivalent to and interchangeable with CS)
h. NGN Next Generation Network
i. POTS Plain Old Telephone Service
FIG. 1a is a high-level block diagram of a Next Generation Network (NGN). The NGN typically contains multiple domains which are controlled by a single Call Server (CS) (also known as a Media Gateway Controller (MGC)). Call Servers are connected to each other and to call control nodes in peer networks. Call related signaling messages are exchanged and the Call Servers control gateway nodes. The gateway nodes served by these Call Servers provide bearer (transport) functionality for media streams corresponding to calls going on between subscribers.
For the successful establishment of an end to end call, several nodes in the access and the core network have to have enough spare processing resources to serve the call attempt. Numerous scenarios could be envisaged, such as televoting or disaster events, where certain nodes become the bottleneck in the network and therefore need to reject call requests in order to preserve their integrity and stable state. By increasing the load on a target node above its engineered capacity its throughput degrades significantly, moreover an extremely high offered load may cause the target node to restart. Hence signaling protocols have to be armed with load control functions, which ensure that the source node decreases its admission rate by rejecting calls in order to relieve the heavy load on the congested target node.
Each Access Media Gateway (AGW) provides connection to the network for thousands of subscribers. Currently, simulations of a proposed European Telecommunications Standards Institute Notification Rate (ETSI_NR) control have shown that control in the NGN can be dependent on a choice of algorithm used by a control adaptor and setting of control parameters. It has been shown that inappropriate choices can lead to premature termination of control during times of overload. Overloads can be caused by a moderate increase across all the associated AGWs at the same time or by an increase on a smaller subset of the AGWs. Normally, an AGW initiates new calls by sending off-hook notification events to a Call Server (equivalent to a Media Gateway Controller and will be interchangeable with MGC hereinafter.
FIG. 1b illustrates a high level block diagram of an overload control mechanism between an MGC and AGW. The ETSI draft mentioned above (ETSI_NR) describes an overload control mechanism between the MGCs and the AGWs to protect the MGCs from becoming overloaded during the previously described mass call events. FIG. 1b illustrates a high-level functional block diagram according to ETSI_NR. ETSI_NR proposes that leaky bucket restrictors be applied at the AGWs to throttle originating POTS call attempts towards the MGCs. A so-called LoadLevel supervision function is implemented in the MGC which periodically measures its load state. If the LoadLevel reaches a critical value, the MGC initiates the originating call restriction mechanism at the AGWs. During periods of overload, the MGC periodically calculates a GlobalLeakRate based on the current LoadLevel. This GlobalLeakRate is then distributed among the AGWs based on their associated wi weights. The weight set is fixed and preconfigured in the MGC. This new leak rate value (notrat), calculated for each AGW using its preconfigured wi weight, is sent to the gateway in a subsequent H.248 MODIFY command from that MGC. Notrat (Notification Rate) provides the rate of off-hook notifications from terminations in the NULL context that can be sent to the MGC by a given AGW. The AGW then sets the leak rate of its leaky bucket to the notrat rate received from the MGC and will use this leak rate to regulate the off-hook notifications. The initial value of the GlobalLeakRate, which is used when the overload is detected at the MGC, is a configuration parameter in the MGC called InitGlobalLeakRate. The value is set to a sufficiently low value to immediately relieve congestion at the MGC, and the calculated GlobalLeakRate is expected to adapt upwards gradually to ensure high utilization of the Media Gateway Controller.
The mechanism described in the current ETSI draft may not provide appropriate protection of the Media Gateway Controllers in all cases. It is foreseen that—if the draft is implemented as currently specified—certain distributions of originating call attempts among the Access Media Gateways can fool the adaptation algorithm and temporarily render the overload control ineffective.
Four main areas can be identified where the currently proposed control scheme has shortcomings:
Failure to tackle focused overload from a group of nodes;
Slow convergence of the control mechanism;
No interoperability with overload control solutions protecting the Media Gateway Controller from other interfaces; and
Termination of control.
If a small group of AGWs (m) are responsible for an overload, then the m group of AGWs offer calls to the associated MGC at a rate determined by restrictors which are styled as “leaky bucket” restrictors (the leak rate of the restrictors are a weighted portion of the MGCs GlobalLeakRate). If the small group of AGWs are the only AGWs offering calls to the MGC while the remaining AGWs (n) offer no calls to the MGC and assuming that all AGWs are equally weighted (i.e., AGW weight, wi=1/(m+n)), then if the situation persists long enough the MGC GlobalLeakRate (G), may settle to G=(C/m)*(m+n), where C is the capacity of the MGC. Depending on the ratio of m and n, this can be many times more than the actual capacity of the MGC. Also each AGW regardless if it is offering calls to the MGC receives a leak rate of G*wi=C/m.
If traffic demand subsequently increases on the non-loaded group of AGWs (n), then the rate of calls offered to the MGC by this group of AGWs will be limited to the rate determined by their leaky buckets and the MGC will become overloaded since the earlier active group m, together with the newly activated group n of the AGWs offer more traffic ((C/m)*(m+n)) than its engineered capacity of C. This state potentially renders the control ineffective for a period of time until a Control Adaptor adjusts the GlobalLeakRate appropriately.
FIG. 2 is a high-level block diagram illustrating overloads of an MGC causing ineffective control at an MGC. If the load offered to an MGC is not distributed evenly, but e.g., group 206 of AGW1 and AGW2 are responsible for an overload, the GlobalLeakRate value will be increased by the Control Adaptor (see FIG. 2) far above the real call processing capacity of the MGC. In this scenario this pair of AGWs, group 206, causing the overload will admit calls at a rate determined by their ‘leaky bucket,’ while other AGWs belonging to group 208 offer calls far below the leak rate they have received from the MGC. (Their leaky buckets do not restrict). If the traffic demand suddenly increases in the area served by group 208 of AGWs, then the nodes in group 208 start to offer calls at the rate determined by their leaky buckets and the MGC will get into overload causing the control to be ineffective for a considerable amount of time. For instance, the Media Gateway Controller can have four MSANs (AGWs) connected to it. Each MSAN has an equal weighting as each of them terminates the same number of subscriber lines. When group 206 of nodes want to offer higher calling rates than the capacity of the MGC, the MGC will detect overload, set the GlobalLeakRate to the InitGlobalLeakRate, and send ¼ of this GlobalLeakRate value to each of the four MSANs.
The MGC starts to gradually increase the GlobalLeakRate value in order to increase the MGC utilization, and continues this process of increasing the GlobalLeakRate until the total incoming rate from the MSANs reaches C, the processing capacity of the Media Gateway Controller. Since it is assumed that only 2 of the four MSANs are responsible for the overload, the increase of the GlobalLeakRate continues until it reaches 2C. At this point, the MGC sends LeakRate=2*C/4=C/2 leak rate values to the MSANs, so the 2 MSANs (AGWs) in group 206 offer enough calls to saturate the Media Gateway Controller. If group 208 of nodes starts to offer traffic then they are also allowed to send C/2 each, therefore the total incoming rate will be 2 times C resulting in two times overload. This case is clearly different than when the overload initially occurs at the initiation of control, because initially the GlobalLeakRate value is initialized to a suitably low value, while in this case the overload will persist for a considerable amount of time until a downward adaptation of the GlobalLeakRate occurs.
Another concern is whether the control can adapt fast enough to be able to follow the changes in the offered rate with reasonable speed. In case of a serious focused overload the global leak rate has to be increased to an extremely high level, e.g. if 10% of the AGWs generate the overload and the CS capacity is 1000 call/s then the global leak rate shall rise to 10000, and even with a quite large adaptation step (e.g. 10 call/s^2) it can take 1000 seconds to adapt to full utilization of the MGC, which is about 16 minutes!
The above illustration might be an extreme example but minutes long adaptation times are still not impossible. This questions the adaptation ability of the whole ETSI_NR algorithm—in fact, what happens here is that the constant provisioned weighting system has a multiplicative effect that can slow down the adaptation in case of a focused overload. The Call Server will unnecessarily reject many calls for a long time period in case of a step overload which means a huge loss of revenue, especially in scenarios when the step overload is caused by e.g. tele-voting, typically with a premium call rate. On the other hand, if we increase the adaptation step then the control will oscillate.
It is assumed that when the Call Server fails to allocate capacity for an originating call request it rejects the attempted request. The main purpose of overload control is to minimize the number of such rejects allowing the CS to maximize its throughput. In NGNs the Call Servers have to serve network initiated and access initiated call requests. Therefore if the CS becomes overloaded its own internal overload protection mechanism will reject both originating and incoming calls. Incoming call requests are initiated using the ISUP protocol from legacy POTS exchanges or enveloped in the SIP-I protocol from Call Servers, but other industry standard call control protocols like SIP or H.323 can also be used. As an example, the ISUP protocol utilizes its own overload control mechanism called Adaptive Automation Congestion Control (AACC). It is desirable to guarantee that in periods of overload, incoming and originating calls to get a configurable ratio of share in the admitted stream, therefore interoperability of overload control solutions (e.g. ETSI_NR and AACC) protecting the same node is crucial. The current ETSI_NR draft provides no solution to solve this interoperability problem. A GlobalLeakRate calculation algorithm is needed, which ensures that the GlobalLeakRate is updated in such a way that the incoming calls from POTS exchanges and other Call Servers can not squeeze out originating calls from the AGWs and vice versa when contending for the capacity of the CS.
Finally, the existing solution fails to tackle the problem of termination of the control properly. Since the call admission control is not performed on the Call Server (CS), it is not known when calculating the leak rate if the leaky buckets at the MSANs (AGWs) are still restricting traffic, or if the overload event has ceased. ETSI_NR suggests simply using a timer. A ‘TerminationPendingTimer’ is started when the measured LoadLevel of the Call Server falls below the GoalLoadLevel. If the measured LoadLevel does not go above the GoalLoadLevel during the lifetime of this timer, the control will be switched off upon timer expiry. But a LoadLevel below the GoalLoadLevel does not necessarily mean that overload has ceased, as it is possible that the mechanism is over-restricting, so that the sources do not offer enough calls to the CS for overload to occur. If the control switches off while the leak rate is still adapting upwards and the overload is present, the CS will soon be overloaded again, and the control will be switched back on with IntialGlobalLeakRate which then can easily result in on-off oscillation of the control, and under utilization of the CS. The required value of the GlobalLeakRate (G) will be dependent upon m and n making the G difficult to estimate, although typically it will need to be significantly larger than C. Under these circumstances, the convergence time of the control to the CS (MGCs) GoalLoadLevel may be prolonged, consequently making setting the value of TerminationPending timer difficult. Inappropriate choices for these parameters can exacerbate this situation even more and potentially lead to premature termination of the control during the overload. For instance, if a TerminationPending timer is set too short and the overload control in the MGC terminates prematurely, the MGC will see a couple of undesired sudden high surge of load (solid curved line). Also, the admitted rate of calls will be lowered many times to the InitGlobalLeakRate and the control will switch on and off again and again. The graph in FIG. 8 illustrates this problem.
In an ideal case, at the start of an overload, the MGC enters the state ‘Overloaded’ and starts adapting the GlobalLeakRate so as to move closer to the MGC's GoalLoadLevel. If the point is reached whereby the MGCs' LoadLevel has fallen below the GoalLoadLevel (which is highly likely in the focused overload case as the InitGlobalLeakRate will likely result in the control over-restricting), the MGC changes state to ‘TerminationPending’, and the MGC invokes the following behavior:
a. if a TerminationPending timer (set when the MGC enters the Termination Pending state) expires, then state in the MGC is changed to ‘NotOverloaded’. Termination of throttling at an AGW is caused by the receipt of a negative Notification Rate (notrat) value; and
b. if a new terminating or outgoing call attempt is received, then the MGC proceeds with the call as normal. A Distribution Function in the MGC will calculate a current notrat value for that AGW (from the GlobalLeakRate) and send the current notrat value using an H.248 Modify command against the ROOT termination (unless the current notrat has already been sent to that AGW, in which case the current notrat is not sent). In order to minimize the number of H.248 transactions, the MGC may nest the Modify command within the same H.248 transaction as that used to progress the call. The Distribution Function notes the notrat value sent to that AGW.
c. the Control Adaptor continues to monitor the MGC LoadLevel, the Off Hook arrival rate and periodically updates the GlobalLeakRate, subject to the following two conditions:                1. the MGC is not exceeding the MaxGlobalLeakRate and        2. if the previous change to the GlobalLeakRate was an increase and the current Off Hook arrival rate is not greater than the previous Off Hook arrival rate, revert to the GlobalLeakRate in force before the previous change.        
d. if the ControlAdaptor detects that the LoadLevel exceeds the GoalLoadLevel, the MGC will move back to the ‘Overloaded’ state.
These two restrictions on the growth of the GlobalLeakRate are required in order to prevent the notrat values sent to the restrictors from rising to an extent that would be problematic in the event of a sudden increase in the off-hook rate.
It would be advantageous to have a system and method for detecting the end of overload that overcomes the disadvantages of the prior art. The present invention provides such a system and method.