The Diameter Base Protocol constitutes a framework for various applications for Authentication, Authorization and Accounting (AAA). The base protocol itself specifies the message format, transport property and functionality, basic error reporting, and security related functions to be used by all Diameter applications.
Diameter applications, both those defined by Internet Engineering Task Force (IETF) such as Diameter Credit Control Application (DCCA) or those that are vendor specific such as Third Generation Partnership Project (3GPP) Gx, are constructed on top of the base protocol and inherit its parameters and capabilities in a very object oriented kind of way. It might be because of its flexibility and easy reuse that Diameter has become very popular in 3GPP, where Diameter based applications are now used on numerous interfaces such as e.g. the Gx, Gxx, S6a, S6d, Rx, Cx and Sh.
The network elements that implements Diameter may act in one out of three defined roles: Client, Server or Agent. The agent role may in addition be separated into relay, proxy, redirect and translation agents.
A Diameter transport connection is defined between two Diameter nodes, server 103 and client 101, and is called a peer, as illustrated in FIG. 1. Diameter peers are connection oriented and may utilize either Transport Control Protocol (TCP) or Stream Control Transmission Protocol (SCTP) as an underlying transport protocol. The base protocol defines well defined mechanisms for peer establishment, maintenance and tear down.
In Diameter, a client who wishes to request services from a Diameter server may contact the server directly by using an already existing peer, or it may set up a new Diameter peer towards the server. A prerequisite for this is of course that it knows the destination-host Fully Qualified Domain Name (FQDN) and Internet Protocol (IP) address of the server. The later may of course be obtained via Domain Name System (DNS) if only the FQDN is known. In case there is no existing direct peer available to a server 103, the client 101 may route requests via an agent 102 in the destination realm. For certain types of agents 102, e.g. Relay agent, the agent 102 will forward the request to a server 103 in the destination realm. Similarly, the response message will also be sent via the agent 102 back to the client 101.
Depending on the network configuration, the client may attempt to establish a new peer connection directly towards the server, e.g. by looking up the destination host IP address via DNS, since it now knows which server the Diameter session is handled by, or it may continue to send successive requests via the agent for the lifetime of the Diameter Session. It is important to notice that the Diameter Session spans multiple peer connection in this case.
The Diameter Base Protocol was designed to be a generic AAA-protocol and it was therefore based on the assumption that its intentional use would be similar to that of its precursors e.g. Remote Authentication Dial-In User Services (RADIUS). As such it still has an air of “Dial-up-Internet” inherited in its design, i.e. application level messages are not expected to be very frequent and the protocol has not been optimized, although this is of course supported, for multiple simultaneous Diameter sessions between a client/server pair.
This may be exemplified by the recommendation in the Request For Comments (RFC) 3588 to set the default Peer watchdog timer to e.g. 30 seconds and not to set this value lower than e.g. 6 seconds. In case no message is received on a peer before this timer expires then a watchdog request, i.e. a “heartbeat” message, is sent to check the availability of the peer connection. If no response is received for this message before the timer expires again, the peer is deemed to be failed.
Considering Diameter “super” applications such as 3GPP Gx and 3GPP Gy where parallel sessions may exist between any given client in the order of millions of concurrent sessions, and the message exchange may be as frequent as thousands of requests per seconds. In such applications, it is obvious that if no response message is received even for a very short timeframe such as a couple of milliseconds, then the peer is most probably experiencing some form of connection problem. Thus the recommendations in RFC3588 do not apply for e.g. 3GPP Gx and 3GPP Gy deployments. In this scenario the client may be e.g. a Gateway GPRS Support Node (GGSN), Bearer Binding and Event Reporting Function (BBERF) or Packet Data Network-Gateway (PDN-GW) and the server may be e.g. a Policy and Charging Rules Function (PCRF) or Online Charging System (OCS).
A serious problem with the Diameter base protocol emerges when such super applications are used in a scenario where a relay or proxy agent is placed between clients and servers. For this scenario there is no rate control between the client and the server.
The transport level flow control of TCP and SCTP can not be used as they exist per peer and not per session. Further on link level, load should never be a limiting factor in 3GPP Diameter networks. The reason for this is that capacity in Operation & Maintenance (O&M) networks is usually not a problem and the traffic characteristic is signaling. Signaling traffic is relatively light, i.e. the arrival process is Poisson, without the heavy tailed distributions of e.g. Internet traffic.
As different network elements, i.e. clients and servers, have different capacity with regard to signaling, i.e. the number of messages per second, and as transport network capacity is not a problem and there is no mechanism for congestion avoidance on a per client server pair, there is an evident risk that a Diameter client or server may drown the other end with requests in situations of intense signaling.
FIG. 2 displays a scenario where this might happen. The example is based on 3GPP Gx where the Gx server 201, i.e. the PCRF, has a much higher signaling capacity than the Gx client 202, i.e. the PCEF. The client 202 and the server 201 are connected with gigabit Ethernet Network Interface Cards (NIC) via an agent 203 in this example. The example scenario in FIG. 2 shows a 3GPP Gx client 202 (PCEF) and 3GPP Gx server 201 (PCRF) with one million simultaneous Gx sessions connected via an agent 203. The Gx client 202 may handle e.g. 1000 messages per second and the Gx server 201 e.g. 10000 requests per second.
During signaling peaks, e.g. during busy hour when the number of server initiated Gx requests are high, the PCRF may unintentionally issue more requests to a Policy Control and Charging Enforcement Function (PCEF) than it can handle, which will cause buffer overflow or node outage. This will result in Gx requests being dropped. As there is no retransmission mechanism defined for Gx, the associated calls will probably be lost.
This is particularly a disadvantage in a case where there are certain requests that the Gx server, in this example, would like to prioritize, e.g. requests associated with emergency and priority services. As the Gx server has no idea of how many requests that it may safely issue, it can not efficiently do the prioritization in an adequate way. During certain situations, e.g. new years eve or in a disaster situation, when network signaling load reach levels that are far beyond what is normal there will be no way to guarantee emergency and priority requests to get through, even for busy hour.
This problem has been recognized within 3GPP and will be part of a new work item for Rel-10 that was recently approved for Enhanced Multimedia Priority Services. The Enhanced Multimedia Priority Services will cover functionality for prioritized signaling traffic, among other things.
For clients and servers that are connected with a direct peer connection it is possible to reuse the flow control mechanisms of the transport layer, i.e. TCP or SCTP, to regulate the request rate of the adjacent peer. However such an approach would imply layer violation between transport layer and application layer, or a Congestion Manager, which would require a special Application Programming Interface (API) to be implemented.