The present invention relates to communication systems. More particularly, and not by way of limitation, the present invention is directed to a router and a method in an Internet Protocol (IP)-based network for migrating protocol processes from one route processor to another route processor using a graceful restart procedure.
In state-of-the-art IP routers, different processors are implemented to handle control functions and packet-forwarding functions. When routing and signaling protocols running on the control processor fail, forwarding of payload traffic is interrupted even though all the information required to perform such forwarding is available to the packet-forwarding processor. This interruption occurs because neighbor routers detect the failure of the routing and signaling protocols and assume that the entire router has failed. Consequently, the neighbor routers compute alternate paths bypassing the “failed” router. During this time, called routing convergence time, there is a potential for traffic loss.
To address this problem, the Internet Engineering Task Force (IETF) has standardized a set of extensions to the routing and signaling protocols to gracefully handle the restart of a failed protocol process on a neighbor router. When these extensions are implemented and a router's control software must be restarted, the router's neighbors continue to use it for forwarding traffic. Neighbors also help the restarted router software relearn the state that was known prior to the failure.
It is also known for IP network operators to build, manage, and provision virtual private networks (VPNs) on top of their existing infrastructure. These networks are typically used by enterprises that need interconnectivity between geographically distributed sites. Using a private network is also appealing because it offers a level of protection from intruders. Telecom network operators also use VPNs to provide traffic separation between various classes of telecom traffic. This is useful for providing different quality of service (QoS) and security services to these traffic classes.
With the growth in the size and speed of IP networks, routers or packet processing nodes must be scalable. Otherwise as the demands for processing power increase, or as more customer VPNs are configured, more and more routers must be deployed with added operational complexity and expenditure. To handle the resiliency and scalability needs of IP and telecom networks, routers or packet processing nodes are increasingly designed using a cluster of processors. To address scalability needs, middleware known as cluster management software distributes the processing load across multiple processors. The increase in processing demands is addressed by adding more processors to the cluster and migrating processes to the new processors. Any state needed by the process is also migrated to its new location by the cluster management software. Although effective, the use of processor clusters increases the complexity and cost of implementation of the routers.
An alternative that offers resiliency against node failures is to maintain one or more control processors (usually known as Route Processors or RPs) in hot-standby state to backup a primary RP. The protocol state is replicated between the primary and backup RPs. If the primary RP fails, the backup RP takes over and masks the failure from the router's neighbors. The complexity in this approach is in synchronizing state information between the primary and backup RPs. For protocols like Border Gateway Protocol (BGP) and Label Distribution Protocol (LDP) that run over the Transmission Control Protocol (TCP), TCP session state (such as sequence numbers, congestion window parameters, and the like) must also be replicated.
Thus, what is needed in the art is a more efficient way to handle the resiliency and scalability needs of IP and telecom networks that overcomes the deficiencies of conventional systems and methods. The present invention provides such a router and method.