Communication systems and networks provide the foundation for information exchange between end systems and users. In general, the ability to exchange information between two end systems may be dependent on a number of other network nodes, i.e., critical network nodes. Critical network nodes may include, for example, nodes along the path for delivery of data and/or control signals, e.g., messages. In some communication systems, the set of critical nodes may also include nodes that are used for service authorization, accounting, call setup, paging and many other functions. Ideally, a communication system or network should provide some level of robustness in order to be useful. Robustness in communication systems may be achieved in many ways including, use of high reliability components, system design redundancy, and fault tolerant protocol designs.
A common fault tolerant protocol design technique relies on what is known as a soft-state refresh mechanism. In this type of approach, state, e.g., information about a device or communications session, that is established in a component or system as a result of the protocol operation is only considered valid (and thus maintained) for a fixed period of time after it is established. Upon expiration of a soft-state time-out, the state is removed from the system. Thus, if the state is required for a period of time longer than the soft-state time-out, the state must be refreshed via protocol signaling prior to expiration of the soft-state time-out. This is the primary approach used for most Internet Protocol (IP) technology. Furthermore, in many cases IP technology places the burden of performing the soft-state refresh on the end system. This is consistent with the end-to-end design principle, which is one of the guiding philosophies of IP technology. This principle suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level. One implication of this principle is that complexity should be put in the end systems, leaving the intervening network simple. Note that this is in contrast with most circuit-switch communication systems that strive to keep end systems simple.
While there are many benefits of a soft-state refresh mechanism, there are also some significant limitations. One of the limitations of soft-state refresh mechanisms is the tradeoff between timeliness of detecting (and potentially recovering from) failures and communication overhead. Faster failure detection/recovery is achieved by making soft-state time-out values small, but this also has the effect of increasing protocol signaling and communication overhead. In large-scale communication networks this can also impact scalability. For example, in a cellular communication system the number of end nodes may be very large. If each end node uses a soft-state refresh mechanism to maintain connectivity via a central network node, e.g., a mobility agent node, the use of small soft-state time-out values also increases the signaling and processing burden of the mobility agent node. Therefore, it may not be practical to reduce soft-state time-out values below some threshold. This in turn limits timeliness of detection/recovery from failures and results in longer service disruption times following a failure.
In view of the above discussion, it is apparent that there is a need for improved methods and apparatus for supporting fault tolerant communication networks.