1. Field of the Invention
The present disclosure relates generally to Equal Cost MultiPath (ECMP) nexthop sets and Link Aggregation Groups (LAGs), and more particularly to systems and methods for implementing adaptive load balancing between ports belonging to a nexthop set and/or link aggregation group.
2. Description of Related Art
In a packet network, “nodes” or “routers” share network address information that allows each node or router to forward packets toward their respective destination networks. For networks defined using the Internet Protocol, each node is provisioned with a network address that identifies the particular network the system is on, and with a system or host address that uniquely identifies the node. These addresses are shared among neighboring nodes to allow each router to build a “tree” with itself as the root node and next-hop paths from itself to every address on the network.
IP addresses consist of a 32-bit (for IPv4) or 48-bit (for IPv6) address that is a concatenation of the network address and the host address. Considering the 32-bit IPv4 address with variable-length subnet masking as an example, some number of the leading address bits are used as the network subaddress, and the remaining bits are used as the host subaddress. For instance, two devices could both be located on the network 192.168.10.0/24, which represents the four bytes of an IPv4 address as four decimal values separated by periods. The suffix “/24” denotes a network subnet mask, i.e., it states that the first 24 bits (192.168.10) are the network portion of the IP address, and thus by default the last eight bits are the host address. Thus two hosts on this network could have the IP addresses 192.168.10.2 and 192.168.10.4.
Routers use IP network and host addresses to forward routed traffic on a packet network according to a routing protocol. Some common routing protocols in use today include Open Shortest Path First (OSPF), Intermediate System to Intermediate System (IS-IS), and Border Gateway Protocol (BGP). OSPF is further described in Internet Engineering Task Force (IETF) Request for Comments (RFC) 2328, “OSPF Version 2,” by J. Moy, April 1998, and IETF RFC 2740, “OSPF for IPv6,” R. Coltun, December 1999. IS-IS is further described in the International Organization for Standardization (ISO) document ISO 8473, “Intermediate System to Intermediate System Routing Information Exchange Protocol for Providing the Connectionless-mode Network Service,” ISO/IEC10589:2002, 2nd Ed. BGP is further described in IETF RFC 4271, “A Border Gateway Protocol 4 (BGP-4),” by Y. Rekhter et al., January 2006.
OSPF and IS-IS are examples of link-state protocols. A “link” can be considered to be an interface or port on a router (although such protocols can be used to distribute other information). The state of that link contains a description of the interface and what routers/networks are reachable through that link. In OSPF, a link-state database would contain the IP address of the interface/device, the subnet mask and other information describing the network, a list of routers connected to that network, a cost of sending packets across that interface, etc.
OSPF routers use link-state advertisements (LSAs) to share information from their link-state databases with neighboring routers in the same autonomous system. Whenever an interface is brought up or a change in routing information known to the router occurs, the router generates a LSA to inform its neighbors of the new or changed link-state information. When a neighbor router receives the LSA, it updates its own link-state database and then propagates the information in another LSA to its neighbors. Thus the LSA is flooded to all routers, and all routers contain the same link-state database.
Whenever a router receives an update to its link-state database, it uses a shortest path algorithm (the Dijkstra algorithm) to calculate a shortest path tree to all destinations, based on the accumulated costs associated with the links used to reach each destination. The shortest path tree will differ for each router, as each places itself at the root of the tree, but all routers should agree with each other as to routes. In other words, no routing loops should exist where node A thinks that a destination should be reached through a node B, and node B thinks that the destination should be reached through node A.
In order to place limits on the flooding of LSAs, OSPF allows routers in the same autonomous system to be grouped into areas. For instance, FIG. 1 depicts two areas A0, A1 of an autonomous system (AS) 100. Every AS must have an area 0 or backbone area. Generally, all other areas connect to the backbone area, although provisions exist for transit areas.
Routers are classified according to their position in the AS. An internal router has all of its interfaces in the same area. In area A0, routers R1 and R2 are internal routers. In area A1, router R5 is an internal router. An area border router (ABR) has interfaces in multiple areas of the AS. R3 has two interfaces in area A0, and at least one interface in another area (not shown), and is thus an ABR. Likewise R4 has two interfaces in area A0, and two interfaces in area A1, making it an ABR as well. An autonomous system boundary router (ASBR) has at least one interface in an area of the AS and at least one interface to another AS or running another routing protocol. The ASBR redistributes information received from the foreign network/protocol within OSPF. In FIG. 1, routers R6 and R7 are ASBRs. Both R6 and R7 communicate with a router R8 outside of the AS using BGP.
Assuming that the routing algorithm cost is the same for R5 to reach either R6 or R7, and the cost is the same for either R6 or R7 to reach R8, R5 has two equal cost paths available (R5-R6-R8 and R5-R7-R8) that it may use to forward traffic for routes advertised by R8. Routing protocols such as OSPF and IS-IS allow two such routes to have equal cost. Equal-Cost MultiPath (ECMP) algorithms, such as those described in Internet Engineering Task Force (IETF) Request For Comments (RFC) 2991, “Multipath Issues in Unicast and Multicast Next-Hop Selection,” 2000, incorporated herein by reference, splits traffic between multiple equal-cost next-hops by approaches such as hashing of partial packet header contents. When successful, such an approach increases the bandwidth available between two nodes beyond the bandwidth available on a single link.
Independent of routing and ECMP, in high-performance networks more traffic may need to pass between two adjacent nodes than can be carried on a single link. Link aggregation refers to a layer-2 process for operating a group of physical links as if they were a single link. At least one standard for logical link aggregation has been promulgated by the Institute of Electrical and Electronic Engineers, e.g., in the IEEE 802.3-2005 standard, Clause 43 and Annexes 43A-C, incorporated herein by reference. This standard sets out an orderly process for adjacent nodes to recognize that both nodes share multiple links that each node allows to be managed as a single logical link. This concept is shown in FIG. 1 as parallel links L1, L2 connecting R5 and R6, and parallel links L3 and L4 connecting R5 and R7. Like ECMP, link aggregation splits traffic between multiple paths (in this case the aggregated links), potentially increasing the bandwidth available between two nodes beyond the bandwidth available on a single link.
In FIG. 1, when a packet arriving on link L5 at R5 has a next-hop of R8, ECMP allows R5 to select either R6 or R7 as a first hop. Once R6 or R7 is selected, LAG selects one of the two available parallel links to the selected first hop router for transmission of the packet.
FIG. 2 contains a block diagram for one possible implementation of R5 as a modular switch/router. Up to n line cards, LC0 to LCn−1, serve the external ports of the device, including the five FIG. 1 labeled ports L1, L2, L3, L4, and L5. Ports L1 and L2 connect to two port interfaces of line card LCO; port L5 connects to one of the port interfaces of line card LCI; and ports L3 and L4 connect to two port interfaces of line card LCn−1.
Each line card LCi contains, in addition to the external port interfaces, one or more packet processors PPi, a content-addressable memory CAMi, a line card processor LCPi, and one or more switch fabric buffers. The packet processors read and manipulate headers on the packets/frames passing through their respective port interfaces. As part of this process, the packet processor typically performs one or more lookup operations on the corresponding CAM, which stores forwarding information, allowing the packet processor to determine an appropriate outgoing port interface or port interfaces for the packet. An incoming packet processor attaches a small “backplane” header to each packet, which includes the appropriate egress port(s), and submits the packets to the switch fabric buffers. An outgoing packet processor operates in the reverse direction—it reads packets from the switch fabric buffers, removes the backplane headers and performs any necessary processing, and submits them to the appropriate egress port(s).
The line cards mate with a backplane 210, which may be electrical and/or optical, containing signal paths to connect the line cards with one or more switch fabric cards, SFC, and one or more route processor manager cards, RPM. The switch fabric card(s) contain a high-throughput switch fabric SF that moves incoming packets in the various line card switch fabric buffers to the appropriate switch fabric buffers for the indicated egress ports. The route processor manager card RPM includes one or more processors that manage overall operation of the card, including a route processor RP.
The route processor RP maintains a global view of the attached network(s), including which layer 3 hosts and networks are reachable through each port, the layer 2 status of each port, the LAN segments/hosts reachable through each port, etc. The routing/switching protocols running on the RP communicate over the backplane with each line card processor LCPi to receive updated information pertinent to each line card's ports. The RP communicates routing/switching updates back to each line card processor LCPi, which LCPi uses to update the contents of CAMi. In the specific FIG. 1 example, the RP runs OSPF and is aware of the LAG connecting R5 to R6 and the other LAG connecting R5 to R7. This information allows RP to communicate forwarding entries to LCP1 for storage in CAM 1. When a packet arrives on L5 having a destination reachable through R8, PP1 will consult the appropriate forwarding entries to select one of L1, L2, L3, and L4 as an outgoing interface for the packet.
One possible packet processor egress interface selection mechanism 300 is shown in FIG. 3. When a packet P1 arrives at the packet processor, a lookup engine LE obtains a copy of the packet headers. Lookup engine LE combines information from the packet headers in various ways to construct a number of keys, including a CAM key and a hash key. The hash key is supplied to a hash calculator HC, which hashes the key to produce a hash value. Simultaneously, the CAM key is supplied to a forwarding table FT, e.g., stored in the CAM, which returns the appropriate forwarding entry for the packet. When the forwarding entry matches an ECMP set or link aggregation group, a multiple interface flag MIF is set to indicate that selection of one of multiple outbound interfaces is required. In that case, a number of interfaces field #IF in the forwarding entry is extracted and supplied to a MOD function, which returns the remainder of the hash value divided by #IF (a number between 0 and #IF-1), labeled the selector. The selector is used to index one of the #IF entries in an IF list to determine the correct outbound IF. The IF list may be embedded in the forwarding entry, or may be stored elsewhere and pointed to by the forwarding entry.