The present invention is an improvement of the prior art for methods and apparatuses for resolving and caching address resolution information between layer 3 protocol addresses and layer 2 hardware addresses. In particular, the present invention addresses difficulties presented when computer systems have more than one communications interface connected to the same layer 2 network
The Internet Engineering Task Force (IETF) Request For Comment (RFC) 826 is the standard for how the Address Resolution Protocol (ARP) should work in many environment including layer 2 Ethernet networks and layer 3 IPv4. RFC 826 is incorporated herein in its entirety by this reference. RFC 826 also highlights issues that drive systems to cache address resolution information and issues that drive the need to put a time limit on how long such information can be cached before it is aged out. Most systems will cache an entry for a set length of time after it is learned and then remove it from the cache. After it is removed, a broadcast ARP request is sent to resolve the address again and then re-cache the results. Some Linux systems will retain a cache entry after it has aged out and the next time a packet is to be sent to that protocol address it will send a directed ARP request to the specific hardware address in the cache entry to verify the mapping is still correct. If the target system responds with an ARP response, the requesting system will refresh the cache entry. If not, the requesting system will send a broadcast ARP request to resolve the mapping.
Address resolution becomes far more challenging when systems have multiple communications interfaces connecting to the same broadcast domain of a network or the same subnet at layer 3. The ARP protocol describe in RFC826 and almost universally implemented in systems that utilize the Internet Protocol version 4 (IPv4) with Ethernet networks only supports mapping a specific protocol address (IP address with IPv4) to a single hardware address (Media Access Control or MAC address with Ethernet). There is substantial prior art attempting to deal with the additional challenges of address resolution when systems have multiple communications interfaces. A driving force is to allow for load balancing of traffic over the multiple interfaces. Another driving force is to allow the interfaces to be connected to multiple switches to allow for redundancy if an entire switch should fail.
Some prior art solutions allow for multiple interfaces to be connected to different switches but only one interface is utilized by the system as any particular point in time. If the primary interface fails or the switch the primary interface is connected to fails, the secondary interface is utilized. This is sometimes referred to as a Fault Tolerant configuration. A Fault Tolerant configuration does not provide for load balancing of traffic over the connections. Some prior art solutions succeed in allowing for load balancing when sending traffic over multiple interfaces but only allows one interface at a time to be used for receiving traffic from other systems. This is sometimes referred to as Transmit Load Balancing. Other prior art solutions allow for a group of interfaces on one system to be bundled together with multiple interfaces on a switch in a manner that allows multiple interfaces to look like one logical interface to both the system and the switch and allows traffic to be load balanced both when sending and receiving, but all of the interfaces in the bundle have to go to the same switch. This is sometimes referred to as Link Aggregation. The IEEE 802.1AX-2008 standard incorporated herein in its entirety by this reference describes the Link Aggregation Control Protocol (LACP) as a standardized way to perform link aggregation. Some solutions allow for the creation of multiple bundles to different switches, but only one bundle can be active at the same time which can be seen as a combination of a Fault Tolerant configuration and Link Aggregation. Another group of solutions allows for multiple switches to appear as though they are a single switch and so the bundle of interfaces can connect to different switches as is exemplified by the variants of Multi-Link Trunking (MLT) designed by Nortel. The following list of patents and patent applications provide a good understanding of the current state of the art and each of them are incorporated herein in their entirety by this reference:    Patent application Ser. No. 11/208,690 titled “NETWORK RESOURCE TEAMING PROVIDING RESOURCE REDUNDANCY AND TRANSMIT/RECEIVE LOAD-BALANCING THROUGH A PLURALITY OF REDUNDANT PORT TRUNKS”;    U.S. Pat. No. 7,505,399 titled “Receive load balancing on multiple network adapters”;    U.S. Pat. No. 7,145,866 titled “VIRTUAL NETWORK DEVICES”;    Patent application Ser. No. 11/048,520 titled “Automated selection of an optimal path between a core switch and teamed network resources of a computer system”;    U.S. Pat. No. 6,687,758 titled “PORT AGGREGATION FOR NETWORK CONNECTIONS THAT ARE OFFLOADED TO NETWORK INTERFACE DEVICES”;    Patent application Ser. No. 11/048,524 titled “Dynamic allocation and configuration of a computer system's network resources”;    U.S. Pat. No. 6,151,297 titled “METHOD AND SYSTEM FOR LINK LEVEL SERVER/SWITCH TRUNKING”;    U.S. Pat. No. 7,505,401 titled “Method, apparatus and program storage device for providing mutual failover and load-balancing between interfaces in a network”;    Patent application Ser. No. 11/468,577 titled “METHOD AND SYSTEM OF TRANSMIT LOAD BALANCING ACROSS MULTIPLE PHYSICAL PORTS”;    Patent application Ser. No. 10/439,494 titled “SYSTEM, METHOD, AND APPARATUS FOR LOAD-BALANCING TO A PLURALITY OF PORTS”;    Patent application Ser. No. 10/938,156 titled “System and method for load balancing and fail over”;    U.S. Pat. No. 6,056,824 titled “EXTENSION OF LINK AGGREGATION PROTOCOLS OVER THE NETWORK”; and    U.S. Pat. No. 7,173,934 titled “System, device, and method for improving communication network reliability using trunk splitting”.
Patent application Ser. No. 11/208,690 is particularly interesting in that it puts forth a means for receive load balancing by having a system with multiple interfaces replace the hardware address in different ARP responses with the different hardware addresses of its interfaces. However, the patent application also correctly points out that as soon as the system sends out an ARP request, this will cause all of the systems on the network to start sending to one single interface.
Turning now to Prior Art FIG. 1, when system A needs to send a packet or frame and system B is the destination or the next hop at the network layer, System A will have determined the protocol address (PA) of System B but will not necessarily know the hardware address (HA) of System B unless it has the mapping from the protocol address for System B to the hardware address for System B in its ARP cache. If it does not have the mapping in its ARP cache, it will build an ARP request and broadcast the request to all of the systems in the broadcast domain. The fields in the ARP request are depicted in prior art FIG. 2. The definitions and intended uses of the fields can be found in RFC 826.
System A broadcasts the ARP request to all of the systems in the broadcast domain because without knowing the hardware address for System B, it does not know how to send the ARP request directly to System B. Therefore, the only way System A has available to it to communicate the ARP request to System B is to indicates that all systems in the broadcast domain should look at the ARP request. System A includes its own protocol address and hardware address in the Sender Protocol Address and Sender Hardware Address in the ARP request as depicted in FIG. 2 to allow System B to cache that mapping as a result of the ARP request. Otherwise, since System B is likely to need to send frames back to System A in the near future, System B would need to send out an ARP request to resolve the mapping between System A's protocol address to hardware address in the near future. Since all of the systems in the broadcast domain look at the ARP request that System A sends all of the systems can cache the mapping between System A's protocol address and hardware address.
The ARP request also includes the target protocol address that system A wants resolved. All of the systems look at the ARP request, but only System B recognizes the target protocol address as one it owns. Therefore, only system B sends an ARP reply back to system A with the mapping between system B's protocol address and hardware address. Because system B knows the hardware address for System A it sends the ARP response directly to System A rather than sending it to all of the systems on the broadcast domain. Only system A sees the mapping between System B's protocol address and hardware address. The fields in the ARP reply are depicted in prior art FIG. 3. The definitions and intended uses of the fields can be found in RFC 826.
As a result of this ARP exchange, System A will have cached the mapping between System B's protocol address and hardware address and System B will have cached the mapping between System A's protocol address and hardware address. As well, any of the other systems in the broadcast domain may have cached the mapping between System A's protocol address and hardware address. At this point, data traffic can be sent in both directions between System A and System B as is depicted in FIG. 1.
An exemplary ARP cache table is depicted in prior art FIG. 4. This depiction shows a protocol field in the cache. Each mapping is unique for a particular protocol. If a broadcast domain transports more than one protocol, more than one type of protocol address could be mapping to the same hardware address. Another implementation could have separate ARP cache tables for each protocol. Also, if only one protocol is supported, the Protocol field would not be needed. FIG. 4 also depicts a VLAN field. When Virtual Local Area Networks (VLANs) are supported, each VLAN is logically its own broadcast domain. Therefore, there is a separate mapping between protocol address and hardware address for each VLAN and the VLAN field identifies which VLAN this cache entry is mapping. If the system does not support VLANs then the VLAN field is not needed. The IEEE 802.1Q specification incorporated herein in its entirety by this reference defines a standard for implementing VLANs. The table in FIG. 4 provides a mapping from a protocol address in the Protocol Address field to a hardware address in the Hardware Address field. One protocol address maps to only one hardware address. If a new ARP is received mapping the protocol address (on this VLAN) to a different hardware address, the new mapping will replace the current mapping. The FIG. 4 cache table also includes an Expire Time. This indicates when the cache entry should no longer be considered valid and be removed. After the entry is removed, if the system needs to send a packet to the protocol address the entry was for, it will need to generate another ARP request and broadcast it to the broadcast domain.
Some Linux systems have adapted the ARP protocol a bit such that when the cache entry has expired, rather than removing the entry from the cache, they indicate it has expired and the next time a packet is being sent to that protocol address, the system sends a directed ARP request directly to the hardware address that was in the cache entry to verify if the mapping is still valid. If it is, the target system will send back an ARP replay. This saves broadcasting an ARP request to every system in the broadcast domain. However, if the mapping is not still valid, the system will still need to broadcast an ARP request and there will be a greater delay before the mapping is resolved during which the data traffic is being queued or dropped.
An object of the present invention is to provide an improved method for caching address mappings between protocol addresses that minimized the time that data traffic is queued or dropped while still allowing for cache entries to be aged out when they are not being used.
An additional object of the present invention is to provide an improved method and mechanism for receive load-balancing of data traffic over multiple communications interfaces connected to the same broadcast domain wherein the communications interfaces do not need to be connected to the same switch.