Access for mobile devices to packet-based networks, such as the Internet, through radio telecommunications networks is an important growth area for the telecommunications industry. For example, 3GPP TS (Technical Specification) 23.401 and 3GPP TS 23.060 disclose an Evolved Packet Core (EPC) network architecture in which User Equipment nodes (UEs) connect to a packet-based network through a Serving GateWay (SGW) and a Packet data network GateWay (PGW).
The 3GPP TS 23.007, 3GPP TS 29.274, and 3GPP TS 29.275 specify the following two mechanisms for an EPC node to detect a restart or a failure of a peer EPC node. An EPC node may be, for example, a Mobile Management Entity (MME), a Serving GPRS Support Node with an S4 interface (S4-SGSN), a SGW, and/or a PGW.
According to one mechanism, an EPC node can detect that a peer EPC node has restarted in response to a recovery Information Element (IE) that is received from the other node, such as by a GPRS Tunneling Protocol version 2 (GTPv2) message (e.g. an Echo Response). When the nodes communicate through a Proxy Mobile IP (PMIP)-based S5/S8 interface, one node can signal to the other node that it has restarted by communicating a PMIPv6 Heartbeat Response that contains a restart counter that is incremented each time the node restarts.
According to another mechanism, an EPC node can detect that a peer EPC node has failed responsive to not receiving a reply to a defined number of consecutive GTPv2 Echo Request messages. When the nodes communicate through a PMIP-based S5/S8 interface, one node may conclude that another node (e.g., a SGW or PGW) has failed responsive to not receiving a reply to a defined number of consecutive PMIPv6 Heartbeat Request messages.
However, receiving no replies to GTPv2 Echo Requests or PMIPv6 Heartbeat Request messages from a peer EPC node may not necessarily mean that the peer EPC has restarted or is undergoing a restart procedure. Instead, a node can become unreachable due to some other issues in the network, such as due to temporary transport network failures, routing misconfiguration, etc. Therefore, the 3GPP requirements provide that it is optional for an EPC node to conclude from the absence of replies to GTPv2 Echo Requests or PMIPv6 Heartbeat Request messages from a peer EPC node that the peer node has “failed” or is being “restarted”.
In an EPC system, pursuant to 3GPP TS 23.401 and 3GPP TS 23.060, the MME and the S4-SGSN maintain the Mobility Management (MM) context and Evolved Packet System (EPS) bearer context (Packet Data Network (PDN) connection) information. The SGW and the PGW maintain the EPS bearer context information for the UEs that are served by these nodes. Pursuant to 3GPP TS 23.007, when an EPC node restarts, it deletes all affected context information. The EPC nodes, who are peer to the restarted node, detect the restart (and associated deletion of the context information) upon reception of the incremented Restart counter as described above.
3GPP TS 23.007 specifies how the MM context and EPS bearer context information of the UEs is handled by an EPC node when the EPC node detects that one of its peer EPC nodes has restarted. For example, when an MME or a PGW detects that a SGW has restarted, they delete all context information for the UEs that were being served by the restarted SGW. When an S4-SGSN detects that a SGW has restarted, the S4-SGSN deletes all the EPS bearer context information for the UEs that were being served by the restarted SGW, however the S4-SGSN may keep the MM context information for those impacted UEs (i.e., the UEs that are still attached to the network).
While the restart of an EPC node (e.g. SGW) could be due to hardware or software malfunctions in that node, sometimes such a restart could also be initiated intentionally by Operations and Maintenance (O&M) operators during, for example, EPC node upgrades and/or feature activation/deactivation that may require node restart. Regardless of the triggering event for a restart, the affected EPS bearer contexts and MM contexts are handled as described above. 3GPP TS 23.401 specifies a MME load balancing mechanism that allows operators to move UEs associated with a MME to another MME before planned maintenance requiring MME restart. However, there is no such traffic offloading mechanism specified in 3GPP specifications for planned SGW restarts.
Regardless of whether a SGW restart is triggered by hardware or software malfunctions or is operator initiated, whenever a SGW restart occurs the affected MM contexts and the EPS bearer contexts (PDN connections) in the MME, S4-SGSN and PGW are deleted, which can cause severe problems for the associated end-user services and for network signalling. For example, ongoing (i.e. active) packet data sessions are interrupted because of the loss of user plane bearers in the SGW. Ongoing data transfers will not be possible until the UE re-establishes the EPS bearers. End-users will not be able to use any UE-initiated Packet Switched (PS) services until the UEs re-attach and re-establish the associated EPS bearers. Similarly, any network-initiated PS services (e.g. UE-terminated Voice Over Long Term Evolution (LTE) (VoLTE) calls) will not be available to UEs.
Other deleterious effects on the EPC can include a signalling increase on the interfaces between the PGW, MME, and/or S4-SGSN and some other EPC nodes in order to clean up the associated bearers/resources in those other EPC nodes. Increased signalling can also occur on the PGW interfaces, such as, to a Policy and Charging Rules Function (PCRF) (to delete IP-Connectivity Access Network (CAN) sessions), to Remote Authentication Dial in User Service (RADIUS) or Dynamic Host Configuration Protocol (DHCP) servers (e.g. to release IP addresses assigned by these servers), and to charging related servers (e.g. due to closing of Charging Data Records (CDRs)). In addition, some of these servers may also need to talk to some other nodes to do further clean up in the IP Multimedia Subsystem (IMS) core or application servers. For example, a PCRF may inform a Proxy Call Session Control Function (P-CSCF) about the deletion of PDN connection.
In general, the MME/S4-SGSN is configured to avoid unnecessary signalling on the radio interface. As used herein, the term “MME/S4-SGSN” refers to a MME node and/or a S4-SGSN node. However, a SGW restart can cause a sudden increase in the signalling between the MME/S4-SGSN and the UE, between the MME/S4-SGSN and the RAN nodes, and possibly also between the MME and HSS.
End-to-end signalling through the EPC network can also increase during re-connection of the UEs to the network after SGW restarts. Re-connection of the affected UEs (i.e. the UEs who have PDN connections via the restarted SGW) to the network may be spread over time based on the rate of UE-initiated uplink packets, such as Non-Access Stratum (NAS) Service Requests, periodic Routing Area Update (RAU) or Tracking Area Update (TAU) Requests. Some pro-active mechanisms may also be adapted in the MME/S4-SGSN (such as MME-initiated detach signalling to the UE with an indication of re-attach required indication or in SGSN deactivation of PDP contexts with re-activation required indication upon detection of the SGW restart) for faster reconnection of the UEs.
The following procedures may contribute to the end-to-end signalling load during the reconnection of the UEs to the network:                signalling due to rejection of NAS messages (such as Service Request, TAU Request or RAU Request signalling) from the UEs whose MM and/or EPS bearer contexts have been deleted in the network;        signalling due to re-attach and re-establishment of the PDN connection(s) and any dedicated EPS bearer(s) required by any specific applications in the UEs; and        re-establishment of the application level signalling between UEs and the application servers, such as for IMS-based services.        
If the signalling load upon detection of an SGW restart is not well managed by the PGW, MME, and S4-SGSN, it might lead congestion, overload, and/or instability in the Public Land Mobile Networks (PLMN) where the SGW has active PDN connections.