The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Modern computer networks based on routers, switches and other infrastructure elements generally are reliable and can offer clients robust service. Consequently, end users have become less tolerant of failure and delay in network communication. Further, entities that make or lose money based upon the reliability of network equipment, such as those in the field of online commerce, are especially intolerant of network problems. Nevertheless, client devices are most adversely affected when a critical service goes down. Therefore, customers and network gear vendors are seeking ways to provide equipment that has greater fault tolerance.
Many services are provided by networks that implement Network Address Translators (NAT) at the borders of private networks or sub domains. NAT is a network node function. For example, NAT is implemented in a router that allows addresses inside a sub domain to be reused by any other sub domains. The NAT network node allows address reuse by saving a sub domain computer's non-routable IP address to an address translation table. The network node replaces the sending computer's non-routable IP address with the first available IP address out of the range of unique IP addresses. The translation table then stores a mapping of the computer's non-routable IP address matched with the one of the unique IP addresses.
When a packet is returned from an end host outside the sub domain, the NAT network node checks the destination address on the packet. The NAT network node then looks in the address translation table to see which computer on the sub domain the packet belongs to. The NAT network node changes the destination address to the one saved in the address translation table and sends it to that computer. If the NAT network node does not find a match in the table, it drops the packet. The sub domain computer receives the packet from the network node. The process repeats as long as the computer is communicating with the external system.
In a network that relies on NAT enabled nodes, it is critical that those computers within the sub domain can communicate with the NAT network nodes. If the NAT network node for a given sub domain goes down, then all communication between the sub domain and other nodes or domains will cease. Therefore, many networks that implement NAT enabled nodes rely on one or more nodes to store and perform the Network Address Translation. Typically, an active network node is elected to generate the table and relay the traffic between the private inside sub domain and the outside public domain. Additionally, a standby network node is elected to be a backup in case the active node goes down. The active network node will create and distribute the NAT information to the standby network node. The standby network node will create a copy of the active network node's information in its database or “NAT table.” A switchover protocol, such as HSRP as defined in T. Li, “Cisco Hot Standby Routing Protocol,” IETF Request for Comments (RFC) 2281, March 1998, controls when a standby network node becomes the active network node.
In the case where the active network node goes down, HSRP elects the standby network node as the new active node. Because the standby network node maintains a copy of the NAT information, traffic between the hosts will flow without any interruption. While the standby network node is active, traffic flows as usual and thus, new NAT sessions will be created and stored in the NAT table as traffic flows through the standby network node.
When a network node is brought back up or re-started, HSRP will re-establish the re-started node as the active node. If the re-started node is established as the active node before learning the existing flow-information, the newly active node will incorrectly handle existing NAT sessions. Specifically, if the NAT sessions were TCP sessions and the sequence numbers or acknowledgement numbers were changed in the NAT table of the standby node, the newly started active network node would not have obtained the correct sequence number or acknowledgement number, and therefore incorrectly handle the ongoing session.
One approach to address this problem is to use HSRP preemption to delay the newly active network node from resuming handling traffic. During the delay, the newly active node would request and receive the current NAT table information. However, in this approach HSRP allows for a static, fixed amount of time to be set as the delay time until the newly active network node resumes traffic handling. Further, the newly active network node is delayed from performing traffic forwarding until the time on the HSRP preemption has expired; therefore, the timer is normally made short, to minimize the length of time during which traffic forwarding cannot occur. Once the timer has expired, the newly active network node resumes traffic handling regardless of whether the newly active node has received the current NAT table information.
The use of the HRSP static preemption is problematic when the NAT table size cannot be pre-determined. It is hard to determine during configuration time what the NAT table size will be, and therefore accurately setting a fixed delay time is difficult. If the NAT Table is large and takes more time than the HRSP preemption allows, the NAT network node will start handling traffic before it has completely received all the NAT information.
Therefore there is a need for an improved approach to control switchover from an active network node to a standby network node.