Organizations that have multiple locations with local area networks (LANs), also termed “campuses,” are interconnected by one or more wide area networks (WANs). WAN connections may run over media such as analog modem, leased line, integrated services digital network (ISDN), Frame Relay, switched multi-megabit data service (SMDS), X.25 and WAN asynchronous transfer mode (ATM).
Generally, a network device such as a switch may be used in a WAN as either an “edge” device to allow clients, servers, and LANs to communicate over the WAN, or a “core”-device to make up the WAN. Typically, edge switches contain at least one interface card for communicating with the WAN (e.g., a WAN interface), and one interface card for communicating with either a client, a server, or a LAN (e.g., a LAN interface). Core switches typically only contain WAN interfaces, also referred to as “trunk cards,” to connect to other core switches.
Each interface card has one or more physical ports that may send or receive data, and the switch interconnects the physical ports to allow data received on a physical port on one interface card to be switched to a physical port on another interface card. For example, a physical port on a LAN interface may be connected to any physical port on a WAN interface. Similarly, a physical port on a WAN interface may be connected to any physical port on another WAN interface. Each physical port typically is identified as a “source” port or a “destination” port, depending on whether the physical port is sending data or receiving data, respectively. Each switched port typically has a buffer for queuing data to be transmitted. For ATM, the data is segmented into “cells” and the cells are sent as bandwidth is available in a first in, first out, fashion.
Each switch plane in the switch fabric operates independently of other switch planes, with no communication among the switch planes in the switch fabric. Each switch plane individually grants requests from the set of source ports in accordance with a predetermined algorithm. When a destination port is either congested (e.g., too much traffic directed at that physical port) or unreachable (e.g., the interface card containing that physical port has suffered a malfunction), the switch plane denies the granting of requests to that destination port. The cells in the buffer of the source port is transmitted if the congestion disappears or the malfunction is fixed.
To perform the switching, a switch may contain one or more switching elements, each of which is connected to all the ports in the switch and performs the switching between the ports. In addition, each switching element has an associated scheduler that controls the timing and scheduling of the switching. Each switching element and its associated scheduler is referred to as a “switch plane.” Together, the set of switch planes is collectively known as a “switch fabric.”
Each switch typically includes a processor card that contains the logic and processing resources needed to control and operate the switch. The processor card typically also contains computer memory needed to store data and programs for functioning of the switch. To provide for redundancy and increase the amount of time that the switch remains operational, switches typically have at least two processor cards. One processor card operates as a “master” card and is the active processor card. One or more additional processor cards act as back-up cards and are ready to take over the switching and other functions provided by the master card if the master card suffers a failure.
Processor cards store information regarding the topology of the network and the status of the hardware in the switch, including the status of each interface card contained in the switch. This information is generally referred to as a “state table.” The processor card also stores the status information for all other switch nodes in the WAN in the state table, and a routing table that contains the list of current connections and the best routes for these connections.
When a non-recoverable software error occurs in the processor unit of the active processor card, the processor has to reset the switch by clearing the computer memory. This requires the switch to rebuild the state and routing tables as these tables are lost upon reset. To rebuild the state table, the processor unit has to reset all hardware in the switch, including each interface, and then poll them after reset to synchronize them with the processor unit. In addition, the processor unit has to poll the other switch nodes in the WAN to determine their status, and build a routing table.
The process for resetting the processor card and rebuilding the state and routing tables consume an enormous amount of time as the processor has to reconstruct all the information stored by the processor card. Thus, significant service disruption occurs for the devices connected to the switch node, forcing either traffic to be rerouted around the switch node or, for the links that cannot be rerouted, traffic to be stopped for those links.
Moreover, the process of gathering information also consumes network resources, as the switch node has to communicate with other switch nodes on the network to ask for information. The other switch nodes will then have to process the request for information and respond, using valuable processor resources.
Although there is a secondary processor card that is used as a back-up, there are often times when the back-up processor card is not ready to take over control from the active card. In addition, in certain configurations, a back-up processor card does not exist. Thus, in these situations, there is not a processor card to act as a standby in case the active processor card encounters an unrecoverable software error.