Data communication protocols serve to facilitate transmission and reception of data across communication networks. For example, transmission control protocol (TCP), Internet protocol (IP), border gateway protocol (BGP), asynchronous transfer mode (ATM), and various other protocols facilitate communication of data between two or more locations in a communication network. Through the use of such protocols, communication of data across a plurality of communication networks may be facilitated, even though two or more of the networks comprise different operating systems and architectures.
The Open Systems Interconnect (OSI) Reference Model developed by the International Standards Organization (ISO) is generally used to describe the structure and function of data communications. The OSI Reference Model encompasses seven layers, often referred to as a stack or protocol stack, which define the functions of data communications protocols. The protocol stack comprises a physical layer, a data link layer, a network layer, a transport layer, a session layer, a presentation layer, and an application layer. A layer does not define a single protocol, but rather a data communications function that may be performed by any number of protocols suitable to the function of that layer. For example, a file transfer protocol and an electronic mail protocol provide user services, and are thus part of the application layer. Every protocol communicates with its peer, which is a standardized implementation of the identical protocol in the equivalent layer on a remote system. For example, a local electronic mail protocol is the peer of a remote electronic mail protocol. As another example, BGP on a local router exchanges routing information with BGP on a neighboring router.
Applications, such as BGP, which require a transport protocol to provide reliable data delivery, often use TCP because TCP verifies that data is delivered across a network (between separate end systems) accurately and in the proper sequence. TCP provides reliability with a mechanism referred to as Positive Acknowledgement with Retransmission (PAR). In simplest terms, a system with PAR re-transmits the data for which it has not received an acknowledgement message from a far-end node. Information is communicated between cooperating TCP modules in segments. A segment is a datagram containing a TCP header and perhaps data. The TCP header contains sequence numbers. Control information, called a handshake, is exchanged between the two endpoints to establish a dialogue before data is transmitted.
As previously discussed, border gateway protocol (BGP) typically runs over TCP (e.g., port 179). BGP version 4 (BGP4) is the current de facto exterior routing protocol for inter-domain (autonomous systems) routing. BGP is a protocol used to advertise routes between networks of routers, e.g., between a Service Provider's network and a Carrier's network. Routers at the edges of these networks exchange BGP messages, which could affect hundreds of thousands of routes. If the BGP process at one of these edge routers terminates (e.g., because of a restart, hardware failure, software upgrade, etc.), service on the routes between the networks is usually affected. The termination also causes additional BGP messages to be exchanged between other edge routers to update information about available routes. Consequently, the termination results in a period of route instability and unavailability of the affected router, which consequences are desirable to avoid. Furthermore, the termination will often result in a flood of re-routing messages being sent into the network, thus adversely affecting performance of the network.
A conventional BGP redundancy technique for addressing BGP process failures involves configuring two or more routers from different vendors in parallel. The objective of such a technique is to reduce the potential for BGP process failures by relying on the assumption that one of the routers will survive at least some of the time a particular set of circumstances that might lead to failure of another router. For example, at least one of the routers would ideally exhibit immunity to failure such as those that might be caused by an offending message, a hardware fault, or a software fault. That is, it is assumed that routers from different vendors are susceptible to different types of failures. This type of conventional BGP redundancy technique is generally expensive due to the inherent cost of the multiple routers and because using equipment from multiple vendors causes additional operation, support, network management, and training costs. Additionally, this type of conventional BGP redundancy technique requires additional BGP messages to be exchanged to move the routes onto the tandem router, thus increasing cost, complexity, and network traffic. The attached routers still notice that the first router has disappeared and then route around it. Accordingly, it is desirable to avoid the disadvantages associated with such a conventional BGP redundancy technique.
A graceful restart mechanism for a router is another conventional technique for addressing BGP process failures. Such a graceful restart mechanism is proposed in an Internet Engineering Task Force (IETF) draft entitled “Graceful Restart Mechanism for BGP”. In this proposal, a router has the capability of preserving its forwarding state (routes) over a BGP restart, the ability to notify its peer routers of this capability and the ability to notify its peer routers of an estimated time for restart completion before it initiates such a restart. Upon detecting that the BGP process of the router has terminated (i.e., a failed router) and in response to receiving a corresponding notification, the peer routers do not send new best routes to accommodate for the failed router unless it fails to restart within the specified time limit.
Such a graceful restart mechanism requires that the peer routers be able to interpret and respond to the restart notification. Additionally, while the failed router is restarting it cannot process routing updates that would normally be received. Consequently, it becomes out of date during the period of unavailability, which is followed by a burst of updates once back in service These updates cause increased “churn” in the routing tables of other routers, which affects performance of the network and should therefore be avoided. Even worse, routing loops or “blackholes” may form in this period of unavailability. Such “blackholes” occur when a route is advertised as available, but when the corresponding router is not actually configured to support such a route, resulting in loss of packets intended to be communicated over that route. Furthermore, the router may not actually be coming back into service. Also, since a graceful restart mechanism allows the specified time limit for routers to be restarted, waiting that amount of time can increase the time it takes to detect a failure and route around the failed router. Additionally, implementation of such a grateful restart mechanism requires protocol extensions to BGP to which all routers aware of the failure must adhere in order to support the graceful restart mechanism. Accordingly, it is desirable to avoid the disadvantages associated with a graceful restart mechanism.
Therefore, facilitating synchronization of protocol tasks and related information on redundant routing modules of a network element in a manner that enables limitations associated with conventional redundancy techniques to be overcome is useful.