The goal of failure protection in packet-based network nodes is to minimize the loss of packets when there is a failure of some portion of the network node. Typical network nodes include a series of port interfaces, a switch fabric, and a control module. The port interfaces connect the network node to external network links and manage the input/output operations between the external links and the network node. The switch fabric provides pathways between each port of the network node for the forwarding of packets and the control module performs the central processing activities required to ensure that incoming packets are properly forwarded. Two critical operations performed by the control module involve implementing the protocols that are used by the network node to forward packets and maintaining protocol databases that are generated as a result of implementing the protocols. Example protocols that are implemented by the control module include Layer 2 protocols such as Spanning Tree Protocol (STP), Link Aggregation Control Protocol (LACP), and Layer 2 Learing and Layer 3 protocols such as Open Shortest Path First (OSPF), Border Gateway Protocol (BGP), and Intermediate System-to-Intermediate System (ISIS), where the layers are defined by the International Standards Organization (ISO) in the Open System Interconnect (OSI) model.
The operations performed by the control module are critical to the proper forwarding of packets within a network node and therefore network nodes are commonly equipped with redundant control modules. For example, a network node often includes a primary control module that is actively operating and a secondary control module that can quickly take over in the event of a failure of the primary control module. Even with a secondary control module in place to provide failure protection, packets may still be lost during the switchover from the primary to the secondary control module if the switchover time is too long. In order to minimize the loss of packets in the event of a failure of the primary control module, it is important to minimize the switchover time from the primary control module to the secondary control module. A critical aspect to minimizing switchover time involves providing the secondary control module with quick access to current versions of the protocol databases upon switchover.
One technique for ensuring that the secondary control module has quick access to current versions of the protocol databases involves maintaining an active protocol database in a memory that can be accessed by both the primary and secondary control modules. For example, an active protocol database can be maintained in a shared memory that is accessible to both the primary and secondary control modules. If the primary control module fails, then operation is switched over to the secondary control module and the secondary control module can immediately access the active protocol database in the shared memory. FIG. 1 depicts an embodiment of a network node 100 that includes port interfaces 102A, 102B, and 102C, a switch fabric 104, a primary control module 106, a secondary control module 108, and a shared memory 110. The active protocol database 112 is maintained in the shared memory. Although maintaining the active protocol database in the shared memory and relying on the protocol database in the shared memory for failure protection allows for a fast switchover, if there is a problem with the active protocol database itself or the shared memory in which the protocol database is stored, then switching from the primary to the secondary control module does not provide reliable failure protection. Specifically, if the failure of the primary control module was caused by a problem with the protocol database or the shared memory in which the database is stored, then the same failure is likely to occur in the secondary control module in the event of a switchover.
Another technique for quickly providing a current version of a protocol database to the secondary control module involves maintaining a copy of the active protocol database for the secondary control module. For example, the active protocol database can be maintained on the primary control module and a backup protocol database, which is a copy of the active protocol database, can be maintained on the secondary control module. FIG. 2 depicts a network node 200 with port interfaces 202A, 202B, and 202C, a switch fabric 204, and dual control modules 206 and 208, in which the active protocol database 212 is maintained on the primary control module and the backup protocol database 214 is maintained in the secondary control module. To ensure that proper forwarding decisions are made in the event of a switchover from the primary to the secondary control module, the backup protocol database is kept up to date with the active protocol database through periodic database updates that are sent from the primary control module. A problem with relying on periodic database updates from the primary control module to the secondary control module is that some database updates may not make it to the secondary control module in the event of a failure of the primary control module. If the secondary control module does not receive each periodic database update, then the secondary control module will be left with an incomplete protocol database.