A computer network is a collection of interconnected computing devices that can exchange data and share resources. In a packet-based network, such as the Internet, the computing devices communicate data by dividing the data into small blocks called packets, which are individually routed across the network from a source device to a destination device. The destination device extracts the data from the packets and assembles the data into its original form. Dividing the data into packets enables the source device to resend only those individual packets that may be lost during transmission.
Certain devices within the network, such as routers, maintain tables of information that describe routes through the network. A “route” can generally be defined as a path between two locations on the network. Upon receiving an incoming data packet, the router examines destination information within the packet to identify the destination for the packet. Based on the destination, the router forwards the packet in accordance with the routing table.
The physical connection between devices within the network is generally referred to as a link. A router uses interface cards (IFCs) for receiving and sending data packets via network links. These IFCs are installed in physical slots within the router that contain physical connections known as ports. Interfaces are configured within the IFCs using interface configurations. These IFCs are sometimes referred to as interface components as the IFCs may be implemented as more than one physical card.
Generally, a router maintains state information. For example, a router may maintain state information representing the current state of the interfaces between the router and the network. Such state information may include information representing the state of one or more IFCs, such as the current configuration of the IFCs. As additional examples, a router may maintain state information representing the state of one or more forwarding engines, one or more routing engines, or other resources within the router.
In particular, a process operating within a router may maintain the state information and communicate changes to the state information to various other processes or components within the router. These other processes or components are sometimes referred to as “consumers,” because they receive and utilize the state information maintained by a first process. These consumers make use of the state information when performing their various functions.
As the complexity of conventional networks has increased in recent years, management of the state information within a router or other network device has likewise become a significant challenge. Some existing methods for managing state information involve caching the information within the operating system, and issuing state change notification messages to software modules executing within the router. In response, the software modules retrieve the state information from the operating system.
These conventional methods may be adequate if the rate of state change is relatively low. When the rate of state change increases, however, the rate of generation of state change messages may exceed the capacity of the consumers to receive and process the state information. In addition, the generation of state change messages may exceed the capacity of the communication channel between consumers to carry messages, and may exceed the capacity of the sender to store messages.
To further compound the problem, routers are increasing in complexity. For example, some conventional routers may include a primary control unit and one or more standby control units, all of which may require state information. In the event that the primary control unit fails, one of the standby control units assumes control of the routing resources to continue operation of the router. The process of switching control of routing functions between the primary and standby control units is often referred to as failover. State information managed by processes executing on the primary control unit may be required by the standby control unit to assume control and continue operation of the router resources. However, once the primary control unit fails, some or all of the state information managed by processes executing on the primary control unit may be lost. In some instances, to assume proper control and ensure operation, the standby control unit is forced to “relearn” the lost state information from each resource, e.g., by power cycling the router resources to a known state.
As part of any failover recovery process in which control units and interface cards attempt to re-synchronize state information for each process within the router, a standby control unit identifies any processes running within the control unit that possesses state information that differs from the corresponding state information present within the interface cards. These two sets of state information typically are identical in that a primary control unit sends changes to both the interface cards as well as the standby control units. If the state information changes have been implemented in both the interface cards and the standby control unit prior to a failover, the standby control unit may begin operation in place of the primary control unit without any problems. If the state information changes have not been implemented on both units, or more correctly, if the changes have been implemented on one unit but not another unit at the time of the failover, the state information is out-of sync for this particular control unit/interface card pair.
Various methods for preventing or correcting this out-of sync condition have been implemented in the past. Many of these methods include halting all operations between the control units and the interface cards until the change to the state information has occurred. As such, the control units may ensure that all updates, except for possibly an in-process state change operation, have been implemented. These approaches typically place an emphasis on keeping the various units in sync, and thus minimizing the amount of relearning that needs to occur. However, these approaches impose a significant cost upon the operation of the control units. When a state information change is to occur, the above approaches require that all other operations within a process wait for the update to occur. If a particular interface component is busy processing state related operations for different processes, many processes that may be related to other interface component that are not currently busy may be paused until the pending update occurs. This situation typically gives rise to many processes within control units and numerous interface components to operate less efficiently than is desired.
Other approaches to solving the re-synchronization/relearning problem have attempted to include enough data within the requests sent to both standby control units and interface components that a standby control unit may determine the lost data from uncompleted state information update operations from the data found in either the standby control unit and/or the interface components. If the control units possess the state change information while the interface components do not, the control units may use the data to resend the required information to the interface units to replace the lost data. If the control units do not possess the state change information while the interface units do, the control units may request the lost state change data be transmitted from the interface components to the control units to again replace the lost data.
This additional approach again attempts to maximize the possible recovery of lost state information change data. However, this additional approach imposes a significant cost to the operation of the various units in several ways. First, this additional approach requires that the state information change data that is sent to every unit involved in the update operation include all of the data needed to update every other unit involved in the process. Typically, interface components do not need all of the information maintained by the control units as it may relate to user interface and display operations of the system that may not be related to the operation of the interface components. As such, requiring the transmission and maintenance of this additional data imposes requirements on each control unit and interface component in the system to possess additional data storage to maintain the data as well as imposes requirements on the data communication resources used to send data between the control units and interface components.