Packet data traffic is growing very quickly in mobile communications networks or mobile operator networks, in many cases it grows much faster than the rate at which the operator can expand its network capacity. This leads to more frequent occurrences of network congestion when the offered traffic is higher than what the RAN (radio access network) is able to fulfill. Also, new services appear often, which may lead to a situation when a new QoE (Quality of Experience) requirement has to be introduced into the network quickly. In this situation, operators need efficient and flexible tools by which they can control how the bottleneck RAN capacity can be best shared so that they can maximize the Quality of Experience of their users.
Recently, in the context of the 3GPP UPCON (User plane congestion management) work item, a new type of solution has been put forward which utilizes congestion feedback from the CN (Core Network) to the RAN. This has been documented in 3GPP TR 23.705 version 0.10.0. When RAN indicates congestion to the CN, it can take actions to mitigate the congestion, such as limiting some classes of traffic, or request to delay some other classes of traffic.
The RAN OAM (Operation and Maintenance) systems contain a lot of information based on which an operator may derive when a state of congestion takes place. Such information can include for example data about the amount of packet loss, packet delay, traffic throughput, air interface utilization, number of connected users, number of connected users with non-empty buffers, etc. A mobile network operator may configure thresholds on one or on a combination of these metrics to determine when a state of congestion is considered in its network. It is also possible for a mobile operator to define multiple levels of congestion using the combination of these metrics, so that the congestion mitigation actions can correspond to the current level of congestion.
Current RAN OAM systems work on a per cell or lower spatial granularity. That means that the determination of congestion could be performed on a per cell basis, or for a group of cells (such as cells belonging to the same eNB (eNode B) for LTE (Long Term Evolution), or cells belonging to the same Service Area for 3G). In order for the core network to take appropriate mitigation action, the core network also has to find out which UEs (User Equipments/mobile entities) are located in a given cell. Hence, the list of affected UEs needs to be determined for the cells which are considered congested based on OAM data.
One solution for OAM based congestion reporting is documented in solution 1.5.5 (also called off-path solution) in section 6.1.5.5 of 3GPP TR 23.705, which suggests a new interface Nq for this purpose. The Nq interface is defined between a new network entity RCAF (RAN Congestion Awareness Function) and the MME (Mobility Management Entity). The RCAF is the node which is assumed to receive RAN congestion related data from the RAN OAM system on a per cell (or lower) spatial granularity. Then, using the Nq interface, the RCAF queries the MME to supply the list of UEs per cell.
A similar approach is suggested for the 3G case, using Nq′ interface from the RCAF to the SGSN (Serving GPRS Support Node). However, there is a difference for 3G since the RAN can already have the UE identities, as the IMSI (International Mobile Subscriber Identity) can be sent to the RNC (Radio Network Controller) node. The RAN OAM collects these IMSIs and RAN OAM then supplies the list of UEs identified by IMSI that affected by congestion to the RCAF. Hence, for 3G it is suggested that the list of UEs affected by congestion are known to the RCAF without contacting the SGSN over the Nq′ interface.
Once the RCAF node has collected information about the set of UEs affected by congestion, it notifies the PCRF (Policy and Charging Rules Function) about the congestion level of the affected UEs (identified by a UE identifier such as the IMSI (International Mobile Subscriber Identity)). The Np interface is defined between the RCAF and the PCRF for this purpose. As described in TR 23.705, the PCRF can then take actions to mitigate the congestion e.g., by limiting the traffic in an enforcement node (PGW or TDF) (Packed Gateway or Traffic Detection Function), or notifying the application function (AF) to limit or delay the traffic, etc.
One problem with the congestion notifications from the RCAF to the PCRF is the handling of the UE mobility. There may be multiple RCAF nodes in a network, each corresponding to a certain geographical area. It may also be possible that a given RCAF handles a single RAT (radio access technology) type only, such as LTE only or 3G only, and the UE may move between the RATs. As a result of UE mobility between different RCAFs, the PCRF may receive notifications from multiple RCAFs for a given UE, and it may not be always possible to know which is the latest information.
This is complicated by additional factors. Firstly, the RCAF may get periodic information about UEs on a longer time-scale, such as 15 mins, and consequently the RCAF may perform the reporting to the PCRF only after some delay. Different RCAFs are not synchronized, so it may happen that when the UE moves from RCAF1 to RCAF2, the reporting from RCAF2 takes place earlier than from RCAF1. Hence the ordering of the incoming congestion notifications at the PCRF may not reflect the ordering of UE mobility events.
Secondly, the RCAF may only know about a given UE if it is affected by congestion. For a UE that is not affected by congestion, the RCAF may not get information for that given UE via OAM or via Nq. Therefore it may happen that the UE moves from RCAF1 to RCAF2, the UE is affected by congestion at RCAF1 but not affected by congestion at RCAF2 hence RCAF2 does not produce any congestion indication to the PCRF. This may lead to the PCRF incorrectly believe that the UE experiences congestion at RCAF1.
One additional aspect to consider in the solution of these problems is that the signaling load on the Np interface between the RCAF and the PCRF may be significant. There may be a high number of UEs in a network, and it is possible that the congestion state changes for a substantial fraction of the UEs. Hence it is desirable to limit the signaling load on the Np interface.
The following solution approaches for UE mobility handling have been suggested.
Approach 1. In 3GPP TR 23.705 version 0.10.0, a validity time is associated with the information sent from the RCAF to the PCRF on the Np interface. It is stated that “When this time has elapsed and no further congestion information has been received, the congestion is assumed to be over.” Such a validity time can be used in the PCRF to prevent that the PCRF permanently assumes the UE to be affected by congestion while it has moved to another RCAF where it is not affected by congestion.
Approach 2. In protocols handling mobility, it is common to use timestamps to signal the ordering of the events in the receiving node. E.g., timestamps can be used as one of the options in the PMIPv6 mobility protocol (RFC 5213 from August 2008).
Approach 3. In protocols handling mobility, it is also common to use sequence numbers to signal the ordering of the events in the receiving node. E.g., sequence numbers can be used as one of the options in the PMIPv6 mobility protocol (RFC 5213).
Approach 4. Intra-LTE TAU (Tracking Area Update) and Inter-eNodeB Handover with Serving GW (Gateway, SGW) Relocation procedure with PMIP (Proxy Mobile IP)-based S5 interface is defined in 3GPP TS 23.402 version 12.4.0, section 5.7.1. That procedure includes the Gxc session (i.e., GW control session) moved from an old SGW to a new SGW. That Gxc session is terminated in the PCRF, in that way the scenario is similar to mobility handling at the Np interface since the endpoint is the PCRF.
Approach 5. In signaling procedures between the MME and the HSS (Home Subscriber Server) for mobility (see e.g., 3GPP TR 23.401 version 12.4.0, section 5.3.3.1 describing the TAU procedure), the HSS sends a Cancel location to the old MME when it receives a mobility update (Update location) from a new MME. This is used to release some of the context information in the old MME.
The following problems are seen with the existing solutions described above.
Approach 1. Using a validity time would be useful if the congestion ends just when the validity time expires. However, if the congestion ends at some other time compared to when the validity timer expires, this approach does not perform well. In case the congestion ends sooner than the validity timer expires, we maintain CN throttling actions unnecessarily, degrading the end user performance. In case the congestion ends later than the validity timer expires, new signaling is necessary to maintain the CN action, which can lead to excessive and unnecessary signaling. Given that the length of the congestion period cannot be accurately predicted in advance, these issues are expected degrade the performance of this solution.
Note also that the congestion status may change between different levels, and those changes are not handled by validity timers which only consider the transition to no congestion state. Hence, the gain potential of the validity timer approach is very limited, and the risk of performance loss is higher.
An additional problem with validity timer based approach is that the PCRF may receive congestion information from more than one RCAF nodes and it is possible that there are multiple such congestion information whose validity timer has not yet expired. In that case, it is problematic for the PCRF to determine which is the actual congestion level. Some heuristics need to be used (use the average; or maximum; or use the latest received information), but such heuristics might not be efficient.
Note also that the use of validity time impacts the PCRF since the PCRF node is otherwise not timer-based.
Approach 2. The use of timestamps in our case is problematic. On the one hand, there is no timing synchronization between the RCAF nodes. But even if we could use sufficiently accurate timing information, it would not be sufficient, due to the long and unpredictable delay in the OAM based data reporting. As noted earlier, it is possible that the UE moves from RCAF1 to RCAF2, yet the congestion information reporting takes place from the RCAF2 earlier than from RCAF1. As the RCAF uses long time-scale OAM reporting (such as reporting on a 15 mins period), the RCAF has no way to determine the whereabouts of a UE on a shorter time scale. So the time ordering of the signaling messages from RCAF nodes to PCRF is not sufficient to determine the ordering of UE mobility events.
Approach 3. Sequence numbering is not applicable in our case, because there is no way to transfer sequence number state from RCAF1 to RCAF2. That is because each RCAF acts on its own, and a RCAF has no way to determine which was the previous RCAF in case of mobility, or which will be the next RCAF in case the UE moves. Hence it is not possible to establish any communication between RCAF1 and RCAF2 to transfer state information for the current sequence numbering.
Approach 4. The solution for Approach 4 involves the explicit establishment of a new Gxc session between the new SGW and PCRF, and the explicit release of the old Gxc session between the old SGW and PCRF. The procedure guarantees that the establishment of the new session always takes place, and the release of the old session also always takes place. This is possible since the procedure involves explicit context transfer from an old MME to a new MME which in turn control the establishment and release of the sessions between the SGWs and the PCRF. (If the MME does not change, the same MME can control the session establishment and release between the SGWs and the PCRF.) This approach is not applicable in our case because there is no context transfer or single node which can control both the RCAF1 and RCAF2 at mobility. Further, in our case a new RCAF may not detect a UE if it is not experiencing congestion, so it cannot make sure that a new session to the PCRF is always established.
Approach 5. In existing mobility procedures between the MME and the HSS, the HSS sends a Cancel location message to the old MME unconditionally. That is possible since there is a context transfer between the new and old MME which guarantees that an indication from the new MME is always sent to the HSS. In our case, such a message to release the old context from the PCRF to the old RCAF cannot be always sent, because there is no guarantee that the new RCAF will signal to the PCRF. Furthermore, it is not possible to send such a message unconditionally, because it is possible that an RCAF node indicates to the PCRF about change to the no congestion state.