The present invention relates to the field of computer system networks. In particular, the present invention pertains to a software-based module for augmenting a server computer system to perform network interface card load balancing and fail over support for fault tolerance.
Computer systems linked to each other in a network are commonly used in businesses and other organizations. Computer system networks (xe2x80x9cnetworksxe2x80x9d) provide a number of benefits for the user, such as increased productivity, flexibility, and convenience as well as resource sharing and allocation.
Networks are configured in different ways depending on implementation-specific details such as the hardware used and the physical location of the equipment, and also depending on the particular objectives of the network. In general, networks include one or more server computer systems, each communicatively coupled to numerous client computer systems.
One common type of network configuration includes a number of virtual local area networks (VLANs). VLANs provide numerous advantages, a primary advantage being that the client computer systems associated with a particular server computer system do not need to all be in the same physical location.
In contemporary networks, server computer systems are typically coupled to the network using more than one network interface card (NIC). Multiple NICs increase the total available bandwidth capacity for transmitting and sending data packets. Multiple NICs also provide resiliency and redundancy if one of the NICs fails. In the case of a failure of a NIC, one of the other NICs is used to handle the traffic previously handled by the failed NIC, thereby increasing overall system reliability. Accordingly, it is necessary to be able to detect when a NIC fails and, when a failed NIC is detected, to switch to a functioning NIC (this is referred to as fault tolerance and fail over support).
In addition, it is desirable to balance the traffic over each NIC when multiple NICs are used so that one NIC doesn""t handle too much traffic and become a bottleneck (this is referred to as load balancing). The use of load balancing allows the spare capacity provided by the multiple NICs to be effectively utilized. It is not necessary to hold a NIC in reserve in case one of the NICs fails; instead, all NICs can be used, thereby increasing the overall performance of the server computer system and hence the network.
Prior Art FIG. 1 is an illustration of exemplary network 50 including two VLANs. In network 50, client computer system 140 (e.g., a workstation) is in one VLAN, and client computer systems 141, 142 and 143 are in a second VLAN. Both VLANs are serviced by server computer system 160. A data packet sent by server computer system 160 contains address information that is used to identify the particular client computer system(s) to which the data packet is to be sent. In addition, the data packet is tagged with a VLAN identifier that identifies the destination VLAN. The methods for addressing a data packet in a network comprising multiple VLANs are well known in the art; one method is defined by the IEEE 802.1Q standard.
Switches 150 and 151 are able to read the VLAN identifier and the other address information contained in the data packet and direct the data packet accordingly. Thus, switch 150 reads the VLAN identifier and will direct the data packet to client computer system 140 if appropriate. Otherwise, the data packet proceeds to switch 151, which directs the data packet to the proper client computer system (e.g., client computer systems 141, 142 or 143) depending on the address information contained in the data packet.
One prior art technique for load balancing and fault tolerance with fail over support utilizes a switch-dependent protocol implemented using server computer system 160 and switches 150 and 151. This prior art technique also requires NICs that are specifically designed for compatibility with switches 150 and 151 and the switch-dependent protocol. This prior art technique is problematic because it requires that the switch be designed with the capability to implement the load balancing and fault tolerance schemes. Thus, the complexity and the cost of the switch are substantially increased. Even so, the capabilities of the switch are relatively limited, and so the schemes for providing load balancing and fault tolerance are also limited.
With regard to load balancing, other prior art techniques attempt to address the drawbacks identified above by implementing software-based load balancing methods implemented on a server computer system. The prior art load balancing methods used in these techniques are based on either a round-robin approach or an approach using the media access control (MAC) address that is associated with each NIC (a unique MAC address is assigned to each NIC by the vendor of the NIC).
In a round-robin load balancing approach, a first data packet is sent out using a first NIC, a second data packet with a second NIC, and so on; when all NICs have been used to send out a data packet, the sequence returns to the first NIC and the cycle is continuously repeated. However, the round-robin load balancing approach is problematic because multiple data packets are typically associated with a given session or transaction between a server computer system and a client computer system. Thus, multiple NICs may be used for a single transaction involving multiple data packets. Consequently, the data packets for that transaction often reach the client computer system out of order. Some computer system protocols are not able to properly handle out-of-order data packets and so the data packets have to be retransmitted until they are received by the client computer system in the proper order. Thus, the round-robin load balancing approach causes a high incidence of retransmissions that increase the time needed to complete a transaction and reduce the overall performance of the computer system network.
In the MAC-based load balancing approach, the selected NIC is chosen by applying some type of procedure that aligns a NIC with a particular MAC address so that, in essence, a data packet bound for the particular MAC address is always sent out over the same NIC. While this addresses the problem of out-of-order data packets associated with the round-robin load balancing approach, the MAC-based approach introduces additional problems. For instance, all client computer systems lying across a particular router will be assigned to the same NIC because the client computer systems all contain the same MAC address (the MAC address for the router); hence, if traffic across this router is normally heavy relative to other routers, the load over the associated NIC will not be balanced relative to other NICs. In addition, the MAC-based load balancing approach is based on the distribution of MAC addresses across the client computer systems, and because the MAC addresses may not be evenly distributed (e.g., one router may serve more client computer systems than another router), the load across the NICs will not be evenly balanced. Also, the bandwidth available to the client computer system is limited by the bandwidth of the NIC that the client computer system is associated with by the MAC address; for example, by virtue of its MAC address, a client computer system with a one gigabit/second NIC may be affiliated with a 100 megabits/second NIC, and consequently the bandwidth of the client computer system is limited to 100 megabits/second.
With regard to fault tolerance and fail over support, a problem exists in the prior art when certain protocols, such as Internet Protocol Exchange (IPX), are used to address outgoing data packets. With IPX, when a data packet is transmitted from a server computer system over a particular NIC, the response from the client computer system is automatically returned via the same NIC. Thus, if the NIC used for the outgoing data packet fails, the client computer system needs to respond to a different NIC.
However, the prior art is problematic because a mechanism is typically not in place for informing the client computer system of the failed NIC. Even if such a mechanism is present, the client computer system may send a data packet before it is informed of the failed NIC.
In addition, with IPX, if the address in the data packet does not correspond to the proper NIC address, then the data packet is dropped (e.g., not delivered to the server computer system). Therefore, the prior art is also problematic because a return data packet sent to a functioning NIC will be dropped if the address does not correspond to the proper NIC address. For example, a client computer system addresses a data packet with a particular NIC address and the data packet is transmitted; however, in the interim the addressed NIC fails, and so the data packet is routed to a different NIC. With IPX, because the NIC address in the data packet does not correspond to the NIC to which the data packet was routed, the data packet is dropped.
Accordingly, a need exists for a system and method for load balancing and fault tolerance wherein the system and method are not limited by the capabilities of a switch. A need further exists for a system and method that satisfy the above needs and do not cause data packets to be transmitted or received out of order and also overcome the shortcomings associated with a MAC-based load balancing approach described above. In addition, a need exists for a system and method that satisfy the above needs and provide fault tolerance and fail over support for the IPX protocol.
The present invention provides a system and method for fault tolerance and for load balancing traffic across network interface cards (NICs) wherein the system and method are not limited by the capabilities of a switch. Furthermore, the present invention provides a system and method that satisfy the above needs and do not cause data packets to be transmitted or received out of order and also overcome the shortcomings associated with a MAC-based approach; namely, the present invention does not assign the same NIC to all traffic across a router, balances the load more evenly across all NICs, and does not unnecessarily limit the bandwidth available to a client computer system. In addition, the present invention provides a system and method that satisfy the above needs and provide fault tolerance and fail over support for the Internetwork Protocol Exchange (IPX) protocol.
Specifically, in one embodiment, the present invention pertains to a system and method implemented on a server computer system having a plurality of NICs coupled thereto, wherein the system and method are used to select a NIC for sending an outgoing data packet from the server computer system. The outgoing data packet is addressed using an IPX address and a socket number. A load balancing scheme is executed in order to select a NIC from the plurality of NICs. The media access control (MAC) address that represents the selected NIC is inserted in the outgoing data packet. The data packet is then sent using the selected NIC.
In one embodiment, the load balancing scheme is a function of the IPX address. In that embodiment, the load balancing scheme is defined by:
SelectedNIC=IPXAddress MOD NumberNICs;
wherein xe2x80x9cSelectedNICxe2x80x9d is the selected NIC, xe2x80x9cIPXAddressxe2x80x9d is the IPX address of the outgoing data packet, and xe2x80x9cNumberNICsxe2x80x9d is the number of NICs coupled to the server computer system.
In alternate embodiments, the load balancing scheme is a function of the IP address and either the destination or source port number. In these embodiments, the load balancing scheme is defined by:
SelectedNIC=(IPXAddressXOR DestIPXSocket)MOD NumberNICs; or
SelectedNIC=(IPXAddressXOR SourceIPXSocket)MOD NumberNICs; or
SelectedNIC=(IPXAddressXOR SourceIPXSocket XOR DestIPXSocket)MOD NumberNICs;
wherein xe2x80x9cSelectedNICxe2x80x9d is the selected NIC, xe2x80x9cIPXAddressxe2x80x9d is the IPX address of the outgoing data packet, xe2x80x9cSourceIPXSocketxe2x80x9d is the socket number for a source socket of the outgoing data packet, xe2x80x9cDestIPXSocketxe2x80x9d is the socket number for a destination socket of the outgoing data packet, and xe2x80x9cNumberNICsxe2x80x9d is the number of NICs coupled to the server computer system.
In one embodiment, the present invention also provides a system and method for fault tolerance and fail over support. The plurality of NICs of the server computer system each include a filter that is adapted to mask a portion of a MAC address in an incoming data packet received at a NIC such that the MAC address in the incoming data packet is equivalent to the MAC address representing the NIC.
These and other objects and advantages of the present invention will become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiments which are illustrated in the various drawing figures.