1. Field of the Invention
The present invention relates generally to methods and apparatuses for transparently providing a failover network device. Specifically, the present invention provides in one embodiment a system for transferring a network function from an active network device to a standby device that is configured the same as the active device but does not become active until the active device fails. The active device and the standby device communicate with each other both over the network and over a separate failover cable that is configured to designate which device is primary and which device is secondary. If both devices start up at the same time, then the primary device becomes active and the secondary device becomes standby. When a positive determination is made that the active device has failed, the secondary device becomes active and assumes the duties of the failed primary device. This procedure occurs without the need for other network devices to change the MAC or IP address to which they are directing packets. The backup network device takes over the active MAC and IP addresses from the failed network device and supports connections to those addresses.
2. Description of the Related Art
Services provided over networks, intranets, and internets have been increasing in complexity. As a result, increasingly complex schemes have been developed to respond to client generated network traffic and to service client requests. In some of these schemes, a single device is placed on the network that is responsible to direct packets other devices or to filter packets that are bound for a number of other devices for some purpose such as security or load balancing. Such devices, when implemented, are critical to the operation of a network because they often represent a single point of failure that may prevent either the entire network or a substantial portion of the network from functioning.
One such critical device is a Local Director as described in U.S. patent application Ser. Nos. 08/850,248, 08/850,730 and 08/850,836 (Attorney Docket Nos. CISCP005, CISCP007. and CISCP008) which were incorporated by reference above. In one implementation, the Local Director is a device for spreading connections made by clients to a given IP address among a number of servers. The Local Director includes a session distribution scheme that efficiently distributes the connection load among a group of servers by determining which server available to it is likely to be able to efficiently handle the load.
In a forward multiplexing mode, the Local Director not only distributes connections for a single IP address to many network devices, but also distributes connections for many different IP addresses to different ports of a single network device that runs applications that service connections for the different IP addresses. The Local Director may implement a large number of virtual network devices which service connections using a set of physical network devices that are made available to the Local Director. The system is an especially robust one because certain physical network devices can be configured as backups for other physical network devices that support a virtual network device. In fact, in one embodiment, virtual network devices that each access a group of physical network devices to handle connections may be defined as backups for each other.
The flexibility of the Local Director in selecting different physical or virtual network devices to handle incoming connection requests is an advantage since failure of individual network devices will not seriously degrade network performance so long as other physical network devices available to the Local Director are capable of assuming more of the load. On the other hand, the presence of the Local Director on the network as a central distribution point that distributes connections among a large number of physical network devices could present a significant danger for catastrophic system failure. The Local Director is a potential single point of failure that could prevent the use of potentially all of the servers connected to it. Such a single point of failure in a network system is unacceptable in many systems. What is needed, therefore, is a reliable failover system that provides a backup for the Local Director.
FIG. 1 is a block diagram illustrating a typical backup system. A client 100 is connected to a primary network device 110 that provides some sort of network service. In the example shown, client 100 is connected to a number of physical servers, 112a, 112b and 112c. The physical servers actually handle connection requests made by the client, and connections are distributed to the physical servers by the primary network device. Thus, if the primary network device fails, the physical servers will be unable to service connections requested by a client 100. The primary network device 110 is therefore a potential single point of failure. In order to prevent the failure of the primary network device 110 from shutting down the entire system, a backup network device 120 is provided. The backup network device 120 may or may not be physically connected to the physical servers while the primary network device is operational. When primary network device 110 fails, then backup network device 120 either makes a physical connection to the physical servers or utilizes its already existing physical connection to take over the function of primary network device 110 and distribute service client connection requests.
The switch from primary network device 110 to backup network device 120 requires the client to sense that a failure has occurred. Client 100 stores the primary MAC address and primary IP address of primary network device 110 in a register 102. Client 100 continues to use the primary MAC address and primary IP address so long as primary network device 110 is operational. When primary network device 110 fails, client 100 must detect that failure and determine that a switch to the backup network device is necessary. The backup device MAC address and backup device IP address is stored at the client in a register 104. The client runs an application that changes from connecting to the primary MAC address and primary IP address to connecting to the backup MAC address and backup IP address. In this implementation, it is necessary that the client be able to detect failure of the primary network device. The client must also store a backup IP address and MAC address for the backup network device and must change the relevant packet headers accordingly so that they are sent to the backup network device when the primary network device fails.
Cisco Systems, Inc., in San Jose, Calif. has developed a Hot Standby Router Protocol (HSRP) that enables routers to be organized into groups, with one router selected as the active router and other routers in the group acting as hot standbys. It would be useful if a suitable hot standby scheme could be developed for other critical network devices such as the Local Director. To be effective, a failover system for a device such as the Local Director should provide a number of features. First, it is important that failure of the Local Director be detected quickly. Since the Local Director must intercept packets and translate addresses to direct connections among a large group of physical network devices, it is important to detect failure of the Local Director as soon as possible before a significant amount of network traffic backs up.
Avoiding a false indication of a failure is as important as quickly detecting failures that occur. When failures occur, it may not be practical to hand over connections from the failed Local Director to the backup. Connections therefore would be dropped and reestablished. Reestablishing multiple connections with the backup will likely generate a large volume of network traffic and could cause delay. Therefore it is important that the Local Director not be failed and traffic not be diverted to the backup unless an actual failure has occurred. It is also important that a condition where a Local Director is toggling between a failed mode and a good mode be avoided. This could occur, for example, if the Local Director is unable to process packets but appears to have a network interface that is functioning. Once the connection load is removed and transferred to the backup, the failed Local Director may appear normal. If the system were allowed to toggle back to the failed Local Director in such a case without intervention, repeated failures and transfers of connections would result.
Finally, it would be desirable if the failure of the primary Local Director and the subsequent transfer of connections to a backup Local Director could be effected without requiring any of the client devices connecting to the Local Director to detect the failure, determine that the failure occurred, or otherwise determine that a switch to a backup Local Director must be made. This is particularly important because one advantage of the Local Director is that in many applications, the client is unaware the Local Director exists. For example, in forward multiplexing mode, the Local Director intercepts packets sent to an IP address that corresponds to a virtual network device and translates the packet headers so that they are directed to a physical network device that may implement a large number of virtual network devices. In such an application, the client does not know that the virtual network device is a virtual device or that the Local Director even exists. It would therefore not be possible for the client to detect failure of the Local Director and route future requests to a backup Local Director. In addition, it is also be desirable that the transfer to a backup Local Director be made transparently so that it would not be evident to the client that a transfer occurred.
In view of the foregoing, there is a needed for methods and apparatuses for providing a reliable, transparent failover for a network device that would otherwise represent a single point of failure for a large system.