The present invention relates to networking technology. More particularly, the present invention relates to providing redundant and not-stop forwarding in a network through an active router and a standby router.
Local area networks (LANs) are commonly connected with one another through one or more routers so that a host (a PC or other arbitrary LAN entity) on one LAN can communicate with other hosts on different LANs. Typically, the host is able to communicate directly only with the entities on its local LAN segment. When it receives a request to send a data packet to an address that it does not recognize as being local, it communicates through a router (or other layer-3 device) which determines how to direct the packet between the host and the destination address. Unfortunately, a router may, for a variety of reasons, become inoperative (e.g., a power failure, rebooting, scheduled maintenance, etc.). Such potential router failure has led to the development and use of redundant systems, systems having more than one router to provide a back up in the event of primary router failure. When a router fails, the host communicating through the inoperative router may still remain connected to other LANs if it can send packets to another router connected to its LAN.
Various protocols have been devised to allow a host to choose a router from among a group of routers in a network. Two of these, Routing Information Protocol (or RIP) and ICMP Router Discovery Protocol (IRDP) are examples of protocols that involve dynamic participation by the host. However, because both RIP and IRDP require that the host be dynamically involved in the router selection, performance may be reduced and special host modifications and management may be required.
In a widely used and somewhat simpler approach, the host recognizes only a single “default” router. In this approach, the host is configured to send data packets to the default router when it needs to send packets to addresses outside its own LAN. It does not keep track of available routers or make decisions to switch to different routers. This requires very little effort on the host's part, but has a serious danger. If the default router fails, the host can not send packets outside of its LAN. This will be true even though there may be a redundant router able to take over because the host does not know about the backup. Unfortunately, such systems have been used in mission critical applications such as stock trading. The shortcomings of these early systems led to the development and implementation of a hot standby router protocol (HSRP) by Cisco Systems, Inc. of San Jose, Calif. A more detailed discussion of the earlier systems and of an HSRP type of system can be found in U.S. Pat. No. 5,473,599 (referred to herein as “the '599 patent”), entitled STANDBY ROUTER PROTOCOL, issued Dec. 5, 1995 to Cisco Systems, Inc., which is incorporated herein by reference in its entirety for all purposes. Also, HSRP is described in detail in RFC 2281, entitled “Cisco Hot Standby Router Protocol (HSRP)”, by T. Li, B. Cole, P. Morton and D. Li, which is incorporated herein by reference in its entirety for all purposes.
HSRP forwards data packets from a host on a LAN through a virtual router. The host is configured so that the packets it sends to destinations outside of its LAN are always addressed to the virtual router. The virtual router may be any physical router elected from among a group of routers connected to the LAN. The router from the group that is currently emulating the virtual router is referred to as the “active” router. Thus, packets addressed to the virtual router are handled by the active router. A “standby” router, also from the group of routers, backs up the active router so that if the active router becomes inoperative, the standby router automatically begins emulating the virtual router. This allows the host to always direct data packets to an operational router without monitoring the routers of the network.
Non Stop forwarding (NSF) is the capability for system to maintain calls and their state even in the event of a hardware software fault on the active router. In order to achieve this, checkpointing is generally done between active and standby routers, such that all recent call states that are present on the active router are transferred to the standby router. Note that this needs to be done as soon as any state change occurs in the event of a catastrophic fault, e.g., a switchover occurs in which the standby router then becomes the active router.
The number of checkpointing transactions that are required to keep two processors in synchronization for most types of calls is limited. Generally, checkpointing transactions need to take place as a call is accepted into the system, as state changes occur during call setup (optional in many NSF highly available systems), once the call reaches steady state, and then at call termination. For the majority of call types there is no state change that requires checkpointing between the time the call reaches steady state and the time that the call terminates. However, there are some types of calls, such as TCP connections where one end terminates on a router, which are difficult to checkpoint in such a way that NSF may be achieved.
The TCP protocol provides for recovery from data that is damaged, lost, duplicated, or delivered out of order by the internet communication system. TCP Protocol is described in detail in RFC 793, entitled “Transmission Control Protocol (TCP)”, by the Information Sciences Institute, University of Southern California, which RFC document is incorporated herein by reference in its entirety for all purposes. This recovery mechanism is achieved by assigning a sequence number to the segment transmitted, and requiring a positive acknowledgement (ACK) from the receiving TCP endpoint. If the ACK is not received within a timeout interval, the segment is retransmitted. At the receiver, the sequence numbers (SEQ) are used to correctly order segments that may be received out of order and to eliminate duplicates. A transmit and receive window is maintained on each end of the connection and determines the valid range of SEQ and ACK numbers that will be accepted.
For a router which is the endpoint for a large number of concurrent TCP connections it may not be possible to continuously checkpoint the sequence and acknowledgment numbers that are transmitted in each TCP connection due to the speed of the data being transmitted. This is especially true if there are a large number of concurrent TCP connections for which the router is an endpoint. Hence, there is some state that cannot be preserved during switchover to the standby router. However, knowledge of the TCP sequence numbers is an essential component required for the operation of a TCP connection. The receiver will discard any packet that it receives that is not within its window of valid SEQ numbers. Likewise, the sender will discard an acknowledgement packet should the received ACK number not be in the valid window. Thus, the connection will be dropped if reasonably current sequence numbers are lost during the switchover.
In view of the above, it would be desirable to reliably provide recovery of a TCP connection terminated on a router after a router switchover event.