In an IMS network, routing is used to find a user or a function in the network. The main mechanism used for routing in the IMS network is DNS. In operation a first IMS node may attempt to send a message to a second IMS node. If the second IMS node is, for some reason, not available to receive traffic it might be added to a blacklist maintained at the first IMS node. The second IMS node may indicate to the first IMS node that it is not available to receive traffic, or the first IMS node may detect the second IMS node is unavailable by the behaviour of the second IMS node.
Depending on the reason for blacklisting, an entire host or individual ports of a host (including its transport protocols) may have to be blacklisted. An IMS node is blacklisted for a predetermined period of time. The period of time may be determined according to the event that triggered the blacklisting.
As a faulting host is removed from a blacklist a large number of calls may fail. For example, consider a system whereby a first IMS node comprising a Call Session Control Function (CSCF) distributes Session Initiation Protocol (SIP) calls to two hosts which are a second and third IMS node. The CSCF has a call load of 100 Calls per second (Cps) and these are distributed over the two hosts round robin. If one of the hosts fails (e.g. the second IMS node enters an error state due to power failure), then 50% of calls will be directed towards the faulty host until the error state is detected and the failed host (the second IMS node) is blacklisted. Once the failed node is blacklisted then current implementations require that it is removed from the blacklist when the appropriate period of time has elapsed. Typically, after a few initial short trials on the order of 30 seconds the CSCF will remove the failed host from the blacklist every 10 minutes on the assumption that by that time the failed host will have recovered.
If the failed host is in an error state for a prolonged period, then each time it is removed from the blacklist the error state will be detected again and the failed host will be blacklisted again.
Each time the failed host is removed from the blacklist it typically takes 32 seconds to detect that it has not yet recovered. (32 seconds is the SIP Transaction timeout default.) If we assume that anything above 10 seconds is considered a lost call, then 100 Cps×½×22 seconds=1100 call setups lost before the failed host is again blacklisted. Accordingly, current arrangements require that a large number of call setups are lost every time a failed host is removed from blacklist before it has recovered from an error state. Lost call setups reduce the effectiveness of the network and also have a negative impact on the quality of service for the end users.
For at least the above reasons, there is a need for a method and apparatus for improved handling of IMS node blacklisting.