A wireless network, illustratively a Long Term Evolution (LTE) network, may comprise groups of mobile telephones or other user equipment (UE) communicating with one or more eNodeBs, which communicate with one or more Serving Gateways (SGWs), which communicate with a Packet Data Network (PDN) Gateway (PGW), which communicates with fixed networks such as IP Multimedia Subsystem (IMS) access networks or core networks. Additionally, the LTE network includes various network elements such as Mobility Management Entities (MMEs), a Policy and Charging Rules Function (PCRF), a network management system (NMS) and so on.
In a failure scenario where a Serving Gateway (SGW) loses connectivity with other nodes in the network (e.g., due to network disconnection, power failure, or even a triggered behavior based on partial failures), a backup SGW must take over operations. This should be accomplished in an intelligent manner to avoid unreasonable spiking in resource utilization while continuing to meet reasonable user/subscriber expectations.
When the primary SGW fails, all of the packets destined for the failed SGW are dropped. In addition, the MME will lose path management states associated with the failed SGW and will need clean up all its active sessions. This will cause the active UEs to re-connect to the network through the backup SGW or an alternate SGW. Similarly, the PGW will lose its path management state to the SGW, and will clean up session state towards the IMS subsystem (all UEs are active on the PGW and into the network). With the active UEs re-attaching, their state will be restored to the PGW and the IMS subsystem.
However, since the majority of UEs are idle at any given moment, at the time of the primary SGW failure the MME will not reach out to the idle UEs to clean up their sessions. This is because the first step to cleaning up the idle UE sessions is to page each of the idle UEs, which is prohibitively expensive. If an idle UE is not cleaned up, there is no way for a network-initiated call to reach it because no network entity knows where in the network it is currently located. Moreover, the IMS sub-system cannot find the UE and no entity is actively encouraging the UE to re-identify itself. The consequence is significant as the UE will not be reachable for up to an hour or two, depending on various timers. This is unacceptable for users.