In order to save costs, telecommunications operators often require the virtualization of radio and core network components from infrastructure vendors. In practice, all the core network components shall be able to run on the same Cloud infrastructure. Besides saving costs, e.g. via spending on uniform hardware (HW) for all the core network components, operators desire that the computing resources, such as virtualized network elements may also utilize the benefits of e.g. a Cloud, and so further optimizing the utilization of the available hardware resources. One of such benefits is the possibility of horizontal scalability, which is also known as scale in/out, of virtual appliances in a Cloud.
As an ordinary (virtual) appliance e.g. on a Cloud, core network elements shall support such horizontal scaling behavior. In practice, it means that it shall be possible to remove (scale-in) or add (scale-out) computing resources, such as virtual machines (VMs), from/to a (virtual) appliance (here a core network element). This would provide the possibility to ‘shrink’ the number of required computing resources to handle the traffic in the low-traffic (e.g. night hours), while dynamically adding new computing resources, such as virtual machines VM, depending on the capacity need for the duration of the high-traffic (e.g. daytime hours).
In regard of telecommunications network elements with a strict expected grade of service, computing resource removal at a scale-in procedure must not cause any service disturbance. In case of network elements as addressed according to the present invention, the main service which is provided is the handling of calls between subscribers. That is, the removal of a computing resource from e.g. a Mobile Switching Center Server (MSS) or a Telecommunication Application Server (TAS) must not disturb any ongoing call set-up, or any ongoing active phase call. In other words, the removal of the computing resource, such as a VM, must be graceful. The corresponding network element where such graceful scale-in procedure for a data session is required, is e.g. the SGSN/MME.
In order to hide this scaling of computing resources of the network element from the external world, it is a well-known practice to utilize load balancers on the edge of the network element. Load Balancers (LBs) terminate traffic from the external world, and they distribute the incoming traffic among the internal computing resources of the network element. On this way, load balancers can be utilized to decide whether a particular active computing resource may receive new call requests, or if the internal computing resource is marked for graceful shutdown, the marked computing resource should not receive new call requests, but of course the traffic for ongoing sessions on that computing resource should be still directed to that computing resource.
Thereby, it becomes apparent that with this logic the computing resource becomes ‘empty’ after a while, once the ongoing sessions handled by the computing resource are terminated by the participants.
On the other hand, it is apparent that generally the length of a call cannot be predicted. That is, it cannot be foreseen when an ongoing, active phase call will be finished by the participants. It means that a situation may occur when the computing resource, such as the VM, cannot be shut down because of some small number of very long call sessions.
The present specification is related to scale-in and scale-out that is referred to horizontal scaling.
Basically, scale-out functionality enables Functional Units' (FUs) capacity to be increased by adding computing resources to handle increased traffic. On the contrary, the scale-in functionality enables functional units' capacity to be reduced by removing computing resources from traffic. Horizontal scaling is already a de-facto operators' requirement that shall be addressed by all equipment vendors.
As becomes apparent from the introduction above, there exists the problem that the length of calls can generally not be predicted. So, if the generic load balancer logic is followed, and the incoming traffic among internal computing resources is shared on such way, that all computing resources have the same CPU load and memory consumption, it means that when it is desired to shut down a computing resource gracefully in low traffic hours, it may handle the same amount of calls than the computing resources that are not selected for shutdown, i.e. long calls may appear in that computing resource with the same probability as in other computing resources. In order to lower this probability, there is the need for improved load balancer logic.