1. Field of the Invention
Embodiments of the present invention relate generally to the field of computer networking and more specifically to preventing cache pollution when a first computing device in a computer network initiates a connection with a second computing device in the computer network.
2. Description of the Related Art
A typical computer network includes of two or more computing devices coupled through a plurality of network connections. Each such computing device includes at least one network interface card (NIC) that implements an electrical interface between the computing device and the network. Typically, each computing device is connected to a network switch with an Ethernet cable that runs from the NIC to the network switch. Interconnecting computing devices through a network switch enables those computing devices to communicate with one another through the network switch, thereby forming a computer network.
Within a given computer network, the rate of data transmissions between two computing devices has historically been limited by the individual performance of each computing device, rather than the speed of the network. In recent years, improvements in computing device performance, especially in the area of network connection management, have allowed computing devices to generate data for network transmissions at a rate greater than the transmission rate of a single NIC. Consequently, many computing devices are now configured with multiple NICs, where each NIC is coupled to the network through an individual network connection (i.e., a separate Ethernet cable running to the network switch). In theory, with such a structure, the overall transmission rate of a computing device is equal to the sum of the individual transmission rates of the NICs included in the computing device. For example, if a computing device has three NICs, each having a transmission rate of one Gigabit per second, then the computing device should have an overall transmission rate of three Gigabits per second.
A desirable configuration for a computing device with multiple NICs is to designate a common Internet Protocol (IP) address for all of the NICs, while maintaining a unique Media Access Control (MAC) address for each NIC. Multiple NICs sharing a common IP address on a single computing device is referred to as a “team.” One aspect of using a team configuration is that network traffic may be distributed among the NICs in the team such that the overall throughput of the team may be maximized. This type of operation is referred to as “load balancing.” Another aspect of using a team configuration is that traffic may be migrated from a nonfunctional or unreliable NIC within the team to a functional or more reliable NIC within the team. This type of operation is referred to as “failover.” Load balancing and failover improve the throughput and reliability of the team's network connections, improving the efficiency of the corresponding computing device within the network.
One drawback of using a team structure, however, is that the initiation of new connections by any of the NICs within the team can “pollute” the ARP caches of the other computing devices within the computer network. ARP cache pollution results when all of the NICs within a team defined on a particular computing device share a common IP address and that computing device uses a conventional ARP broadcast request to initiate a new connection with another computing device within the computer network. The mechanics of ARP cache pollution and the networking problems resulting from ARP cache pollution are set forth in the following example.
FIG. 1 illustrates a computer network 100 that includes a first computing device 108, a second computing device 102, a third computing device 104, a switch 106, and a (“Dynamic Host Configuration Protocol”) DHCP server 110. The first computing device 108 includes a first NIC (“NIC1”) 116 and a second NIC (“NIC2”) 118, each of which couples the first computing device 108 to the switch 106 through a network connection 126 and a network connection 128, respectively. The second computing device 102 includes a NIC 112, which couples the second computing device 102 to the switch 106 through a network connection 122. The third computing device 104 includes a NIC 114, which couples the third computing device 104 to the switch 106 through a network connection 124. The DHCP server 110, a specialized computing device, contains a NIC 120, which couples the DHCP server 110 to the switch 106 through a network connection 130.
Each computing device in the computer network 100, including the DHCP server 110, is configured to have a unique IP address. Additionally, the first NIC 116 and the second NIC 118 of the first computing device 108 are configured as a team 117 and therefore share a common IP address (i.e., the IP address assigned to the first computing device 108). As is well known, each computing device in the computer network 100 includes a device driver program (not shown) that controls each NIC within that particular computing device. Typically, the TCP/IP stack includes an ARP cache that tracks the IP addresses and corresponding MAC addresses associated with recent network communications through the computing device.
As also shown in FIG. 1, a TCP/IP connection 132 exists between the second NIC 118 of the first computing device 108 and the NIC 114 of the third computing device 104. For illustrative purposes only, this example assumes that the first computing device 108 initiates a new connection with the second computing device 102 through the first NIC 116. Because of the pre-existing TCP/IP connection 132, ARP cache pollution may occur in the third computing device 104 when the first computing device 108 initiates this second network connection. As set forth below, the origin of the ARP cache pollution problem lies in the way a network computing device typically establishes a new network connection—namely, by using an ARP broadcast request.
As is well-known, MAC addresses are used to route traffic within a computer network. Consequently, a first computing device within a network generally cannot initiate a connection with a second computing device within the network without knowing the MAC address of the second computing device. The purpose of an ARP broadcast request is to allow the first computing device to request the MAC address of a second computing device knowing only the IP address of the second computing device. Therefore, an ARP broadcast request usually precedes any direct communication between two computing devices in a given computer network. However, every ARP broadcast request includes the IP and MAC addresses of the transmitting machine. Since the computing devices within the network maintain a collection of recent IP-to-MAC address translations within their respective ARP caches, those computing devices may update their respective ARP caches upon receiving an ARP broadcast request to reflect the IP and MAC addresses that the transmitting machine included in its ARP broadcast request. ARP cache updates of this sort may result in ARP cache corruption.
Specific to the example, to initiate a connection with the second computing device 102, the first computing device 108 first has to determine the MAC address of the second computing device 102. Since the first computing device 108 is going to establish the connection through the first NIC 116, the first computing device 108 transmits an ARP broadcast request to the other computing devices within the computer network 100 that includes the IP address of the first computing device 108 and the MAC address of the first NIC 116. The ARP broadcast request is received by each of the other computing devices within the computer network 100 (i.e., the second computing device 102, the third computing device 104 and the DHCP server 110). The ARP cache of the third computing device 104 already includes an entry reflecting the IP address of the first computing device 108 and the MAC address of the second NIC 118 since these are the IP address and MAC address associated with the pre-existing TCP/IP connection 132 between the first computing device 108 and the third computing device 104. Since the ARP broadcast request includes an IP-to-MAC relationship (IP address of the first computing device 108 and MAC address of the first NIC 116) that is different than the IP-to-MAC relationship resulting from the TCP/IP connection 132 (IP address of the first computing device 108 and MAC address of the second NIC 118), the third computing device 104 may overwrite its ARP cache to reflect the “new” IP-to-MAC relationship for the first computing device 108 included in the ARP broadcast request. Such a change in the ARP cache of the third computing device 104 is referred to as “ARP cache pollution” because the ARP cache entry corresponding to the existing TCP/IP connection 132 (IP address of the first computing device 108 and MAC address of the second NIC 118) is overwritten with a new ARP cache entry corresponding to the ARP broadcast request transmitted by the first computing device 108 through the first NIC 116 (IP address of the first computing device 108 and MAC address of the first NIC 116).
Importantly, when the entry in the ARP cache of the third computing device 104 associated with TCP/IP connection 132 is overwritten to reflect the IP address of the first computing device 108 and the MAC address of the first NIC 116, the TCP/IP connection 132 is disrupted. Specifically, all traffic for the TCP/IP connection 132 is redirected from the second NIC 118 on the first computing device 108 to the first NIC 116. More generally, all traffic transmitted to the first computing device 108 by any computing device on the computer network 100 may be redirected to the first NIC 116 rather than being directed to some other previously configured NIC on the first computing device 108.
ARP cache pollution is particularly problematic in the face of established network connections, such as the TCP/IP connection 132, because those connections may be active and transferring data when they are interrupted and redirected, potentially resulting in data loss. Further, disrupting established network connections in this fashion may compromise any load balancing and/or failover settings previously in effect for the team 117 on the first computing device 108, leading to further data loss.
As the foregoing illustrates, what is needed in the art is a technique for initiating a new network connection between a first computing device in a computer network and a second computing device in the same network that avoids ARP cache pollution.