The operational status of different network processing devices needs to be monitored to ensure reliable network operation. If a network device is identified as being down, the network administrator or automated mechanisms can reconfigure the network around the failed device to a standby device. The standby device is maintained in the same state as a primary device so that network operations can be maintained if the primary device fails.
It is difficult to accurately identify a device failure. Techniques such as pinging have been developed to monitor network components. Pinging operations monitor devices by sending test packets. If the ping test packet is not returned, a failure condition is identified for the network processing device. However pinging uses substantial network bandwidth to send test packets back and forth between the monitoring device and the device under test.
A Resource Policy Management System (RPMS) consists of multiple stateful components and a state-less component called a Remote Access Service Router (RASER). All of these components are implemented as individual processes and can be deployed on separate hosts. These components communicate with each other via a User Datagram Protocol (UDP). A response always follows a sent request. The RASER acts as a front-end for the system and receives all traffic from Universal Gateways (UGs). The RASER routes this traffic to the appropriate RPMS component. The RASER also routes inter-component traffic.
The stateful RPMS components can be deployed as hot-standby pairs. For fault-tolerance and in case the component fails, all traffic can be re-directed to the RPMS standby component. Since the RASER routes traffic, it should be able to detect component failures and redirect traffic. This includes process, host or network failures.
However, using UDP in RPMS communications can indicate component failures even when the host machine is available. To solve this failure detection problem, pinging is used to periodically send test packets to the RPMS components. If the test packets are not returned (ping failure), communication is switched to a standby component. However, as described above, pinging each RPMS component uses substantial network bandwidth.