IP networks generally provide an excellent infrastructure for geographically distributing components of a telecommunication system. The underlying IP network is optimal for transmission for control signaling, and, when bandwidth is available, can provide an acceptable Quality of Service (or QoS) or Grade of Service (or GOS) for voice communications.
One of the problems of Voice over IP or VoIP communications surrounds system reliability. Existing solutions concentrate on providing call processing capacity with redundant gateways or media network path redundancy without synchronization. The Audiocodes Median Gateway™, for example, supports N+1 redundancy, which typically requires setting up the calls on the board made active (not 1+1 hot standby for call preservation). Call state synchronization solutions are available in some media gateways but selection of the active component is done by a co-resident control plane, not the processing components. Such solutions are based on heartbeat messages with associated timeouts, often requiring complete failure for an interchange to occur. This third party, or software message, model is implemented by software cluster solutions, such as Veritas™, GoAhead™, and HA Linux vendors such as Monta Vista™.
In redundant systems, rapid failure detection and operational control between two devices is difficult to ensure without a third entity acting as tiebreaker or to actually manage the selection of the active device. In a failure situation, the control decision requires ensuring one device is no longer active before enabling the standby device and that usually requires a timeout of some heartbeat because a failing device may not be capable of notifying a peer that it is no longer providing service. Making that timeout period too short leads to conflict, incorrect failure detection (false positives), and more overhead with critical time deadlines. Longer timeouts enable more reliable operation but incur more data loss and associated service disruption, particularly in VoIP applications.
In VoIP systems, fault detection and reporting is considered difficult to generalize. With the increase in demand for highly available systems, some vendors have developed third party libraries and tools to enable the application of failure detection and response to very generalized systems. Unfortunately, these solutions themselves are capable of great complexity and, as a result, they consume system resources and can be difficult to implement reliably, requiring stronger processors and more overhead. For example, GoAhead advertises 54 different states for the software objects representing critical system resources and requires 3.7 Mb of runtime memory and 10 Mb of disk space. In many cases, GoAhead uses only two states, namely simple or failed, and multiple hierarchical relationship and redundancy policies to define behaviors when faults occur. Other script-based systems are capable of simple generalization but quickly become complex as multiple objects or events interact to determine fault behavior and are synchronized between redundant systems. These systems also require interpreted languages typical of workstation class systems. The complexity of the various systems is surprising in light of the fact that software development studies have shown a direct correlation between complexity and errors in implementation.
Other systems concentrate on database synchronization between servers. These systems are too large for smaller, embedded systems.