The present invention relates generally to network systems using redundant or standby devices working together in a redundancy group to provide a virtual router service. More particularly, the present invention relates to methods and apparatus for providing improved failover notification to redundancy group members when the primary member ceases to function, allowing failover (that is, change over) to another group member and thus allowing continued virtual router service.
As noted above, local area networks (LANs) are commonly connected with one another through one or more routers so that a host (a PC or other arbitrary LAN entity) on one LAN can communicate with other hosts on different LANs. Typically, the host is able to communicate directly only with the entities on its local LAN segment. When it receives a request to send a data packet to an address that it does not recognize as being local, it communicates through a router (or other layer-3 or gateway device) which determines how to direct the packet between the host and the destination address in a remote network. Unfortunately, a router may, for a variety of reasons, become inoperative (e.g., a power failure, rebooting, scheduled maintenance, etc.) creating a trigger event. Such potential router failure has led to the development and use of redundant systems, which have more than one router to provide a back up in the event of primary router failure. When a router fails, the host communicating through the inoperative router may still communicate with other LANs if it can send packets to another router connected to its LAN.
Various protocols have been devised to allow a host to choose a router from among a group of routers in a network. Two of these, Routing Information Protocol (or RIP) and ICMP Router Discovery Protocol (IRDP) are examples of protocols that involve dynamic participation by the host. However, because both RIP and IRDP require that the host be dynamically involved in the router selection, performance may be reduced and special host modifications and management may be required.
In a widely used and somewhat simpler approach, the host recognizes only a single “default router” (which also may be referred to as a “default gateway” in some instances). As will be appreciated by those skilled in the art, in many instances, the terms “router” and “gateway” may be used interchangeably in this disclosure. In this approach, the host is configured to send data packets to the default router when it needs to send packets to addresses outside its own LAN. It does not keep track of available routers or make decisions to switch to different routers. This requires very little effort on the host's part, but has a serious danger. If the default router fails, the host cannot send packets outside of its LAN. This may be true even though there may be a redundant router able to take over, because the host does not know about the backup. Unfortunately, such systems have been used in mission critical applications such as stock trading.
The shortcomings of these early systems led to the development and implementation of redundant gateway systems, which allow for failover recovery. “Failover” is defined as the substitution of a new gateway device for one that has failed or is otherwise not available, wherein the new gateway device assumes the duties and functionalities of the failed device. For example, a gateway device operating in a standby operating mode may take over for another gateway device that was operating in an active operating mode prior to its failure.
One such system is the Hot Standby Router Protocol (HSRP) by Cisco Systems, Inc. of San Jose, Calif. A more detailed discussion of the earlier systems and of an HSRP type of system can be found in U.S. Pat. No. 5,473,599 (referred to herein as “the '599 patent”), entitled STANDBY ROUTER PROTOCOL, issued Dec. 5, 1995 to Cisco Systems, Inc., which is incorporated herein by reference in its entirety for all purposes. Also, HSRP is described in detail in RFC 2281, entitled “Cisco Hot Standby Router Protocol (HSRP)”, by T. Li, B. Cole, P. Morton and D. Li, which is incorporated herein by reference in its entirety for all purposes.
Another redundancy gateway system is the Virtual Router Redundancy Protocol (VRRP), which is an election protocol that dynamically assigns responsibility for packet forwarding to one of a group of VRRP routers on a LAN. A VRRP router is configured to run the VRRP protocol in conjunction with one or more other routers attached to a LAN. In a VRRP setup, one router is elected as the “Master” router with the other routers acting as “Backup” in case of the failure of the Master router. VRRP is described in detail in RFC 2338, entitled “Virtual Router Redundancy Protocol”, by S. Knight, et al., which is incorporated herein by reference in its entirety for all purposes.
HSRP is widely used to back up primary routers for a network segment. In HSRP, a “Standby” router is designated as the back-up to an “Active” router. The Standby router is linked to the network segment or segments serviced by the Active router. The Active and Standby routers share a “virtual IP address” and possibly a “virtual Media Access Control (MAC) address” which is actually in use by only one router at a time. All Internet communication from the relevant private network employs the virtual IP and MAC addresses. At any given time, the Active router is the only router adopting and using the virtual addresses. Then, if the Active router should cease operation for any reason, the Standby router takes over the Active router's load (by adopting the virtual addresses). This allows the host to always direct data packets to an operational router without monitoring the routers of the network.
A Cisco HSRP system is shown in FIGS. 1A and 1B. As seen in FIG. 1A, four gateways 110A–D operate in a normal mode, providing redundant default gateway services in an active/standby configuration for a common IP subnet. In FIG. 1A, the multiple routers 110A–D form a redundancy group 108 (RG) and share a virtual MAC address 118 and a virtual IP address 116. Hosts 120A–C on a common subnet 130 set their default gateway IP address 126 and MAC address ARP cache 128 to the virtual addresses 116, 118 within RG 108 for their subnet. In an RG 108 of a prior HSRP system, an “active” RG member 110A (for example, an “Active HSRP enabled router”) is elected based on pre-configured priorities or other suitable criteria and/or methodologies.
The initial Active router 110A of the RG 108 responds to all address resolution protocol (“ARP”) requests (or any similar or analogous mechanisms used by the router for providing address information to requesting parties) for the virtual IP address 116, thus providing default gateway services for all hosts 120 of the common subnet 130 during normal operation. During normal operation, a secondary RG member of the RG 108 (for example, member 10B in FIG. 1A) remains in a “Standby” mode. If the primary member 110A of the RG 108 should fail, as shown in FIG. 1B, the Standby router 110B will assume the virtual MAC address 118 and the virtual IP address 116, thus effectively becoming the primary member (or “Active router”) and thereafter providing uninterrupted gateway services to all of the hosts 120 of common subnet 130 without the need for additional ARP discovery and/or resolution. This configuration provides a reliable failover function for the gateway devices.
VRRP provides a service that is functionally similar to HSRP. VRRP denotes the HSRP Active Router as a Master Router and any HSRP Standby/Listen Routers as Backup Routers. VRRP employs a virtual IP address and a virtual MAC address mechanism in a manner analogous to HSRP, providing hosts with a default Virtual Router or Gateway for communicating outside of the local LAN.
First hop redundancy protocols such as VRRP and HSRP typically have a failover period of several seconds during which time traffic is not being forwarded. This delay is due to the detection mechanism used in these protocols. In VRRP, the failure of a Master router is detected via the non-receipt of Master advertisements for 3 Hello periods plus any skew time. For example, a typical advertisement period is 1 second, meaning that a failed Master typically would be detected after 3–4 seconds. Such delays are not effective for supporting a high-availability routing environment in which only the active router or gateway device is supposed to route traffic. HSRP employs a similar failover mechanism which typically waits for 3 Hello periods (also referred to in HSRP as the Holdtime) before changing over to a new Active router or other gateway device.
In view of the foregoing, it would be desirable to provide redundant gateway services having improved failover notification.