1. Field of the Invention
This disclosure relates in general to network computer systems, and more particularly to a method, apparatus and program storage device for providing mutual failover and load-balancing between interfaces in a network.
2. Description of Related Art
Computer systems linked to each other in a network are commonly used in businesses and other organizations. Computer system networks (“networks”) provide a number of benefits for the user, such as increased productivity, flexibility, and convenience as well as resource sharing and allocation.
Networks are configured in different ways depending on implementation-specific details such as the hardware used and the physical location of the equipment, and also depending on the particular objectives of the network. In general, networks include one or more server computer systems, each communicatively coupled to numerous client computer systems.
As the use of networked computer systems increases, the need has arisen to provide additional bandwidth to handle the electronic traffic on the network. For example, inadequate bandwidth can result in data stalling in the pipeline between a client and a server. This stalling can significantly limit network performance.
Network interface cards (NIC) are used to connect a server or any computing device to a network. Such NICs include, for example, Ethernet cards or Token Ring cards that plug into a desktop computer or server. The NIC implements the physical layer signaling and the Media Access Control (MAC) for a computer attached to a network. Multiple NICs effectively attach a computer to a network multiple times. This increases the potential bandwidth into the network proportionally. Multiple NICs also provide resiliency and redundancy if one of the NICs fails. In the case of a failure of a NIC, one of the other NICs is used to handle the traffic previously handled by the failed NIC, thereby increasing overall system reliability. Accordingly, it is necessary to be able to detect when a NIC fails and, when a failed NIC is detected, to switch to a functioning NIC (this is referred to as fault tolerance and fail over support). NICs are typically represented in the host operating system through kernel objects referred to as “network interfaces.” Herein, the network interfaces that are directly used by the Internet Protocol (IP) will be referred to as “IP interfaces.” Furthermore, interfaces directly corresponding to the NICs will be referred to as the physical interfaces. Interfaces derived from physical interfaces, as described herein will be variously referred to as logical or virtual interfaces.
Load balancing is a technique used to reduce data bottlenecks caused by an overloaded communications network. In load balancing, the traffic between a server and a network is shared over multiple NICs. Such load balancing typically requires special software. Load balancing also provides fault tolerance, which maintains data communication between the server and the network in the event of a disruption in a data link. When a link fails, the load is failed over to a backup or secondary link such that signal continuity is maintained.
A well-known technique is to group multiple physical links together so that they appear as a single network interface to the Internet Protocol (IP) layer of the TCP/IP stack. The load balancing and failover are then implemented among the links without the IP layer being aware of it. Examples of such techniques are the ‘bonding’ driver in Linux, Etherchannel or IEEE 802.3ad link aggregation standard.
However, these techniques suffer from several disadvantages. For example, since the system considers the multiple physical links as a single NIC the load balancing is implemented below the IP layer. In other words, the multitude of NICs is presented as a single interface to the IP protocol. Therefore the network layer information, e.g. the routing table, cannot be used to load balance the data traffic. Generic tools that work at the network layer do not apply as well. These disadvantages also apply to the failover mode.
The link aggregation techniques described above further require specialized switches that can consider multiple switch ports as one; in the case of directly connected peer systems, both ends must be configured to support the standard. Furthermore, the failure of the switch causes all the links to loose connectivity. In an alternative mode which supports failover only but not load balancing the links may be connected to different switches, however in such a configuration, only one link can be active at a given time.
Load-balancing can also be provided at the IP layer wherein the load is balanced across multiple IP interfaces. The data is load balanced in accordance to the routing table entries, which point to a particular IP interface for a given route. On failure of a NIC an alternative method for failover must be implemented and the routing table updated which can take time. The link level fail over described earlier occurs within a short (millisecond) interval whereas route propagation can take much longer. In addition the IP address needs to be associated with the backup interface and failover MAC address informed to the peers. Therefore, a method is required that allows for load balancing at the IP layer while providing a fast failover.
It can be seen then that there is a need for a method, apparatus and program storage device for providing mutual failover and load balancing between interfaces at the IP layer in a network.