The present invention relates to information handling systems (IHSs) that include network switches, i.e. devices that forward data in computer networks. More particularly, the invention relates to IHSs that can process multicast traffic in computer networks.
As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option is an IHS. An IHS generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes. Because technology and information handling needs and requirements may vary between different applications, IHSs may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in IHSs allow for IHSs to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. IHSs may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems, such as a network switch.
FIG. 1 shows an example of a computer network with nodes 110 (shown as 110.1, 110.2, 110S.1, 110R.2, etc.) interconnected by wired or wireless links 120. Each node 110 is an IHS, and may (or may not) include a network switch, i.e. a node 110 may forward data transmitted between other nodes. Some switches are shown at 110S and 110R. As shown in FIG. 2, each link 120 is connected to port interfaces Px (i.e. P1, P2, etc.) of two or more nodes 110. A data packet transmitted on a link 120 may include a layer-2 packet 208 (FIG. 3), which includes a layer-2 source address 210S (FIG. 2), a layer-2 destination address 210D, and a layer-2 payload 210P. Each of source and destination addresses 210S, 210D can be a physical address of a port interface Px of a node 110, or can be a logical layer-2 address of a group of ports of the same or different switches 110 any one of which can process the packet. (We use the words “port” and “port interface” interchangeably. A port can be a physical wired or wireless port, or for example can be part of a physical port's bandwidth. Both parallel and serial ports are covered by this term.) Logical layer-2 addresses are used to form Link Aggregation Groups (LAGs) described below.
A switch 110S or 110R, e.g. 110S.1, has a number of ports Px connected to respective LAN segments 130 (Local Area Network segments). Each LAN segment 130 includes one or more nodes 110. The switch 110 (110S or 110R) may be a layer-2 switch that forwards packets based on layer-2 addresses 210S, 210D. However, if a packet is addressed to the switch itself, i.e. the destination address 210D identifies the switch, then the switch may use layer-2 payload 210P to process the packet. Layer-2 payload 210P may include a layer-3 packet (e.g. IP packet) as shown in FIG. 3. The layer-3 packet includes a layer-3 source address 220S, a layer-3 destination address 220D, and a layer-3 payload 220P. The switch may forward the packet based on layer-3 destination address 220D for example.
A switch may or may not be capable of performing such layer-3 forwarding. As used herein, the term “switch” is a general term for a forwarding network node, including bridges and routers. The term “router” means a switch that can perform layer-3 forwarding, i.e. forwarding based on layer-3 destination address 220D. A router may or may not perform layer-2 forwarding. Some routers are marked as 110R in FIG. 1.
To forward a packet 208, the switch 110 determines an interface Px on which the packet must be transmitted. The interface is determined from the destination layer-2 or layer-3 address 210D or 220D. The switch learns the layer-2 addresses from incoming packets: if a packet arrives at some interface from some source address 210S, the switch associates the address with the interface for future forwarding operations. A router 110 learns the layer-3 address information from other routers, which exchange the pertinent information by executing routing protocols (such as Routing Information Protocol (RIP), Open Shortest Path First (OSPF), Border Gateway Protocol (BGP), and others).
This layer-2 or layer-3 knowledge gained by each switch is suitable for unicast transmissions, i.e. when each address 210D or 220D identifies a single node 110. This knowledge is hard to use for layer-3 multicast, i.e. when an address 220D identifies a group of nodes 110.
A multicast transmission can reduce network utilization by transmitting only one copy of a packet over a shared path. For example, if a node 110.4 in FIG. 1 transmits a multicast packet 208 to a group including the nodes 110.9 and 110.10, then only one copy of the packet needs to be delivered from node 110.4 to router 110R.2. The packet is duplicated only at router 110R.2, with one copy transmitted to each of nodes 110.9, 110.10 over separate paths. Significant gains in network utilization can be achieved, especially when large numbers of such packets need to be transmitted (for example if the packets are a moving picture distributed to millions of viewers, or are voices and images of teleconference participants).
An important goal of multicast processing is to reduce redundant traffic: preferably, at most one copy of each packet should appear on each link 120. This goal is also important for unicast transmissions: unicast packets can be unnecessarily replicated due to presence of loops (redundant paths) in the network. For example, a packet can reach the switch 110S.1 from router 110R.4 via a path through router 110R.2, or a path through router 110R.3; there are paths through any one or both of these routers. Redundant paths are provided in order to increase the network bandwidth and reliability, but they may have to be disabled to reduce traffic replication. To keep redundant paths active, a network may use Link Aggregation Groups (LAGs) or Equal Cost Multi-Path routing (ECMP).
A LAG denotes a group of ports which is associated with a single logical layer-2 address. For example, in FIG. 1, port P4 of router 110R.4 is a LAG port, containing physical ports connected respectively to ports P4 of routers 110R.2 and 110R.3. If router 110R.4 must forward a packet on its port P4, the router transmits the packet on just one of the physical ports, so the packet is forwarded to router 110R.2 or 110R.3 but not both. (The physical port may be selected randomly, and/or based on a hash of information in the packet, e.g. of the headers' fields 210S, 210D, 230S, 230D, the IP type field (not shown), and/or some other fields.)
Further reduction in packet replication can be achieved by coordination among routers. For example, routers 110R.2 and 110R.3 can form a Virtual Link Trunking (VLT) system 140, such as described in U.S. Pre-Grant Patent Publication US 2011/0292833 (Dec. 1, 2011) incorporated herein by reference; both routers 110R.2, 110R.3 can be of type S4810 available from Dell Inc. of Texas, United States. In the example of FIG. 1, “InterCluster” Link 120.0 (ICL) of VLT system 140 is connected to ports P1 of routers 110R.2 and 110R.3. The ports P3 of the two routers are connected to a LAG port P3 of switch 110S.1. The ports P5 of routers 110R.2 and 110R.3 are connected to a LAG port P5 of router 110R.20.
The ports such as P3, P4, P5 of routers 110R.2 and 110R.3 will be called virtual ports herein. More particularly, if the two routers 110R.2 and 110R.3 have ports connected to a common LAG port of another switch, such ports of routers 110R.2 and 110R.3 will be called virtual ports. The routers 110R.2 and 110R.3 may have any number of virtual ports.
Routers 110R.2 and 110R.3 may include non-virtual ports, such as port P10 of router 110R.2.
Routers 110R.2 and 110R.3 exchange learned information regarding packet forwarding. The exchange is performed via link 120.0.
The traffic received on link 120.0 is restricted to reduce traffic replication. More particularly, if a VLT member router 110R.2 or 110R.3 receives a packet on link 120.0, the router will not forward the packet on any virtual port. For example, if router 110R.2 receives a packet on port P1, it will not forward the packet on its ports P3, P4, P5 because the packet is forwarded to switches 110S.1, 110R.4, 110R.20 by router 110R.3 if needed.
ECMP is a layer-3 mechanism to suppress traffic replication while keeping redundant paths. In ECMP, the layer-3 destination address is associated with a group of ports by the router's database. The router forwards a packet on just one of the ports. The port on which the packet is forwarded may be selected randomly and/or based on a hash of the packet header's fields.
Some challenges for multicast transmission will now be described on the example of IGMP (Internet Group Multicast Protocol) and Sparse-Mode PIM (Protocol Independent Multicast). IGMP is defined for example by RFC 4604 (Internet Engineering Task Force (IETF), August 2006). Sparse-Mode PIM is defined by RFC 4601 (IETF August 2006). RFCs 4604 and 4601 are incorporated herein by reference. IGMP defines how a multicast end-point (sender or receiver) 110 can request joining or leaving a multicast group. PIM defines how routers 110R set up multicast paths for distribution of multicast packets.
According to Sparse Mode PIM, each end-point sender or receiver 110 of multicast traffic is associated with a single Designated Router (DR). Suppose for example that the switch 110S.1 does not perform layer-3 forwarding. Then end-point nodes 110.1, 110.2, 110.3 can be associated with router 110R.2 or 110R.3 as a DR. However, in order to reduce traffic replication, RFC 4601 allows only one router to serve as a DR for a LAN. The reason for this restriction is as follows. Suppose that both routers 110R.2 and 110R.3 serve as DRs. Suppose further that a multicast group contains nodes 110.1, 110.2, 110.3; router 110R.2 serves as a DR for nodes 110.1 and 110.2, and router 110R.3 is a DR for node 110.3. Then a packet from node 110.4 to the group would be forwarded to nodes 110.1 and 110.2 through router 110R.2, and to node 110.3 through router 110R.2. Therefore, the packet would have to be duplicated at router 110R.4. If only router 110R.2 served as a DR, then the packet could be delivered to all nodes 110.1, 110.2, 110.3 without duplication.
On the other hand, if there is only one DR, say only router 110R.2 is a DR, then the multicast traffic cannot use the additional bandwidth provided by the path through router 110R.3.