In a VoIP system, the communication path between an internet protocol public branch exchange (iPBX) server and an internet protocol (IP) phone client is typically established using a transport protocol such as transmission control protocol (TCP) or user datagram protocol (UDP). For either case, an application-level heartbeat scheme has been a popular approach to monitoring the end-to-end path connectivity.
The timing of a typical heartbeat scheme can be determined based on a transmit timer value (tx), a receive timer value (tr), and the number of consecutive heartbeat misses (m) detected before declaring a particular path failure. These values can be summarized as (tx, tr, m). During operation, a heartbeat is usually sent after a tx timer has expired. A heartbeat miss event can be reported after a tr timer has expired. Any message transmitting activity can reset and restart the corresponding Tx timer. Any message receiving activity can reset and restart the corresponding RX timer.
It can be assumed that the delay time, tdelay=tr−tx, is system latency, which indicates an estimated maximum time required for a heartbeat to be delivered and processed from end to end along a network path. A default value can be considered to be similar to a TCP default value (3 seconds) of round-trip-time, therefore, it can be assumed that tdelay=3 seconds.
In view of the foregoing, it generally follows that tdelay is the minimum or lower bound time required for detecting a heartbeat miss or path failure based in a typical heartbeat scheme.
In a one-way heartbeat model, only one side sends heartbeats; the other side listens for the heartbeats. The listening side asserts a heartbeat miss if its RX timer expired in tr seconds; it declares its path broken when m consecutive heartbeats are missed. The time required to detect a path failure in a one-way model is:[(m−1)×tr+tdelay]<T<=m×tr.
In a two-way heartbeat model, both sides exchange heartbeats independently. Both sides listen for heartbeats and monitor the path status.
In a typical iPBX client-server system shown in FIG. 1, without loss of generality, a two-way heartbeat model is illustrated with particular heartbeat values of (tx, tr, m)=(27, 30, 2).
Under the model illustrated in FIG. 1, the iPBX server 102 and IP phones 104 exchange heartbeats every 27 seconds over the TCP links 106. A heartbeat miss is reported if a TCP link has been idle for 30 seconds. Missing two consecutive heartbeats on a TCP link indicates that the link is down or the primary iPBX is not reachable. The phone can then start a failover process by closing the current TCP connection to the primary iPBX and establishing a new TCP connection to a secondary iPBX 108. The failure detection time Tdetect has been a crucial factor to system availability in an IP phone network. For any link, Tdetect is in the range of33 seconds<=Tdetect<=60 seconds, when m=2,3 seconds<=Tdetect<=30 seconds, when m=1.
An application-level heartbeat scheme, as discussed with respect to FIG. 1, is different from a TCP protocol stack scheme. One standard type of TCP protocol stack scheme is the Keepalive scheme. In this scheme, each Keepalive message expects a Keepalive acknowledgement (ACK) from a far-end of the path. If an ACK is not received after multiple retransmissions, path failure is asserted. TCP Keepalive is an optional feature of the TCP protocol. It is disabled by default. There are three parameters related to keepalive: tcp_keepidle, tcp_keepintvl, and tcp_keepretry.
The tcp_keepidle parameter specifies the interval of inactivity that causes TCP to generate a KEEPALIVE transmission for an application that requests it. tcp_keepidle defaults to 2 hours. The tcp_keepintvl parameter specifies the interval between the retries that are attempted if a KEEPALIVE transmission is not acknowledged. tcp_keepintvl defaults to 75 seconds. The tcp_keepretry is the number of retransmissions to be carried out before declaring that remote end is not available. tcp_keepretry defaults to 8.
The TCP Keepalive scheme is not designed for fast detection in a TCP link. It is a kernel-level controlled TCP stack that has system-wide behavior, which is not suitable for per-link or per-application interface purposes.
Another type of standard is the Stream Control Transmission Protocol (SCTP). The SCTP is a transport layer protocol used extensively in HTTP applications. SCTP transports multiple message-streams, whereas TCP transports a byte-stream. Both TCP and SCTP have similar retransmission mechanisms for reliable data transfer. Similar to TCP Keepalive, SCTP uses a simplified probing scheme to detect the path connectivity; it periodically sends “heartbeats” when no other data is being sent. If a heartbeat-ACK is not received after multiple retransmission efforts, a path failure is raised. The recommended heartbeat interval is 30 seconds. Clearly, the actual failure detection time is much longer than 30 seconds due to retransmission attempts.
In general, the path failure detection time of 30˜60 seconds or more discussed above is too long for real-time service switch over (or failover) to a secondary iPBX. Reducing the heartbeat interval is typically not an effective method since it can create excessive network traffic overhead.
Reference will now be made to the exemplary embodiments illustrated, and specific language will be used herein to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended.