Switched network elements, such as layer 2 and layer 3 switches, typically include management modules for participating in layer 2 and layer 3 protocols, learning layer 2 and layer 3 addresses, and distributing copies of forwarding tables to input/output modules associated with ports. In order to provide increased reliability, some switched network elements include primary and backup management modules. During normal operation, the primary management module performs switch management and packet forwarding functions. The backup management module monitors the operation of the primary management module and takes over when the primary management module fails.
Conventional failover methods between primary and backup switch management modules are not hitless. That is, the methods used to perform such failovers are such that received packets will be discarded or routed around the failed device while the failover occurs. For example, in one type of failover, when the primary switch management module fails, the backup switch management module initializes. That is, it loads configuration information, begins participating in routing and forwarding protocols from an initial state, and once it builds its routing and forwarding tables, begins forwarding packets. The time required to load configuration information, participate in the routing and forwarding protocols from an initial state, and program hardware to begin forwarding packets is such that other devices in the network may mark the failed device as unavailable and route packets around the device, depending on protocol timeout values. In addition, packets that were already received by the device will be dropped.
In light of the problems associated with failovers between switch management modules, it is desirable to perform a failover in which complete initialization of the new primary switch management module is not required. For example, it is desirable for the backup switch management module to take over the duties of the failed primary switch management module without resetting the packet forwarding hardware. Thus, some packets may be forwarded by the new primary switch management module without being dropped. However, such a failover may not be hitless because network protocols must be restarted from the initial state in the new primary switch management module. Participation in such protocols to develop the proper protocol states and data structures requires time, during which packets may be dropped. Thus, even a failover in which packet forwarding hardware is not reset may not be completely hitless.
Another problem associated with switch management modules is software upgrades. Software upgrades have not been hitless because of the inability to communicate protocol state and data structure information between different software versions. For example, if the backup switch management module is initialized with a new software version and the primary switch management module is executing a prior software version, the data structures between the software versions may not be compatible. As a result, the new software version may not be capable of using data generated by the switch management module executing the prior software version. Therefore, the backup switch management module executing the new software version must initialize its hardware and begin participating in forwarding protocols in order to build its forwarding databases. As discussed above, due to the amount of time required to participate in network protocols and build the appropriate data structures, software upgrades that require switch management module initialization may not be hitless.
Accordingly, in light of the problems associated with switch management module failover and upgrade, there exists a need for improved methods and systems for hitless switch management module failover and upgrade.