There are three fundamental types IP addresses: unicast, broadcast, and multicast. A unicast address is designed to transmit a packet to a single destination. A broadcast address is used to send a datagram to an entire subnetwork. A multicast address is designed to enable the delivery of datagrams to a set of hosts that have been configured as members of a multicast group in various scattered subnetworks. Multicasting is not connection oriented.
A multicast IP packet is assigned a “group address” in the destination address field of the IP header. A host may join or leave a multicast group at any time and may join any number of groups. A host may be a member of more than one multicast group at any given time.
A group membership protocol is employed by routers to learn about the presence of group members on their directly attached subnetworks. When a host joins a multicast group, it transmits a group membership protocol message for the group(s) that it wishes to receive, and sets its IP process and network interface card to receive frames addressed to the multicast group.
IP multicast allows the network elements in the path of a given traffic flow to replicate that flow such that it can be sent to multiple receivers from a single source. For example, A sources content, sends it to B. B replicates the content and sends a copy to C and D.
Various protocols are used to manage multicasting content to ensure that the content is directed downstream to subscribing hosts efficiently. Efficiencies in this context encompass minimizing unnecessary forwarding and avoiding loops among hosts and routers.
The Internet Group Management Protocol (IGMP) runs between hosts and their immediately neighboring multicast routers. The mechanisms of the protocol allow a host to inform its local router that it wishes to receive transmissions addressed to a specific multicast group. Also, routers periodically query the LAN to determine if known group members are still active. If there is more than one router on the LAN performing IP multicasting, one of the routers is elected “querier” and assumes the responsibility of querying the LAN for group members.
Based on the group membership information learned from the IGMP, a router is able to determine which (if any) multicast traffic needs to be forwarded to each of its “leaf” sub-networks. Multicast routers use this information, in conjunction with a multicast routing protocol, to support IP multicasting across the Internet.
A series of routes from a source is referred to as a source “tree.” A source tree is the simplest form of distribution tree. The source host of the multicast traffic is located at the root of the tree, and the receivers are located at the ends of the branches. Multicast traffic travels from the source host down the tree toward the receivers. The forwarding decision on which interface a multicast packet should be transmitted out is based on a multicast forwarding table. This table consists of a series of multicast state entries that are cached in the router. State entries for a source tree use the notation (S, G). The letter S represents the IP address of the source, and the letter G represents the group address.
For example, a source 196.7.89.10 that is transmitting multicast packets to the destination group 239.194.0.5 has a forwarding cache entry of (196.7.89.10, 239.194.0.5). A separate source tree exists for every source that is transmitting multicast packets, even if those sources are transmitting data to the same group. This means that there will be an (S, G) forwarding state entry for every active source in the network. For example, if another source, such as 196.1.23.4, became active that was also transmitting to group 239.194.0.5, then an additional state entry would be created as (196.7.25.18, 239.194.0.5). Source trees provide optimal routing at the cost of additional multicast state information in the network.
Packets from a source must arrive on the correct interface. A packet that arrives on the correct interface is replicated on one or more outgoing interfaces that are associated with the group address and saved to an Outgoing Interface List (“OIL” or “OLIST”).
The S,G pair data and the OIL data are typically referred to as state data. These data are maintained by each multicast switch and router in a source tree. For high-end routing and switching platforms, scaling is not an issue. However, lower-end platforms such as those typically used as top-of-rack (or “TOR”) switches in a datacenter environment often have significantly reduced scaling capabilities. In a datacenter environment, the scaling limitations manifest as finite limitations in the maximum number of simultaneous multicast groups that can be flowing through the device, usually a few hundred or thousand groups per device.
In unicast routing, routes to individual hosts are not generally contemplated. Instead, bit masks are used to allow for an aggregated match for a given set of destination IP addresses. In the case where multiple candidate matches exist, the longest or most specific bit mask match is always preferred. This is commonly notated in the form x/y, where x is an address prefix (e.g. 192.0.2.0 or 2001:db8::) and y is the number of left-justified bits in the bitmask. For example, 2001:db8::/32 means that the bitmask is the first 32 bits of the prefix in question. Moreover, if the routing table contains routes for both 192.0.2.0/24 and 192.0.2.64/28, then when routing a packet to 192.0.2.68, the latter route will be preferred. An equivalent concept does not currently exist for multicast routing. All multicast routing is done by explicit matches of destination group (S,G) as described above.
In a datacenter, there will typically be a switch/router that aggregates the traffic from a number of individual or blade center style servers or other network appliances (purpose-built hardware for a certain task). The collection of equipment that is connected to a particular switch/router is referred to herein as an equipment “rack” without regard to the physical arrangement of the switch/router and its connected devices. There may be multiple racks in a datacenter, and each switch/router aggregates data upward to a core switch so that the aggregated data may be sent towards its final destination.
In the case where servers in a datacenter are being used as sources for multicast content, such as in large-scale IP video and audio distribution, each server may have one or more specific types of content in varying formats and bitrates that it is streaming to one or more individual multicast group addresses for distribution to the rest of the network. In this configuration, there may be hundreds or even thousands of individual, discrete multicast sources and groups in a single rack lineup, each representing unique content being served from this location.
The servers in the datacenter may also receive data streams, including multicast data streams, from other devices. The received data streams may be processed by the servers such that a single stream may exit the server as multiple streams. By way of illustration and not by way of limitation, a data stream may be processed to provide content to devices having different capabilities, operating systems, and display characteristics.
A commodity switch/router is challenged to maintain this level of multicast state. When the capacity of the switch/router is reached, it is typically replaced with hardware with higher capacities and capabilities at significant additional cost.
At the hardware level, a typical switch/router has a control plane that interacts with other devices at the protocol level (for example, BGP, IGMP) to learn the topology of available paths and routes and a forwarding plane that moves data from one port to another. The control plane uses the topology data to instruct the forwarding plane how data is to be routed through the switch/router.
In the case of multicasting, the control plane relies on IGMP (either through direct participation or snooping) to determine which ports are connected to hosts that want to join a particular multicast group having a specified S, G pair. The joins are used to populate the OIL, which assigns particular multicast traffic to particular ports. The data from a multicast source (S,G) may be replicated to multiple ports that are each connected to at least one subscribing host. The replication usually occurs in the forwarding plane. Each S,G pair has unique control plane entries that relate an S,G pair to a port. As the number of multicast streams handled by a rack increases, the demand on the switch/router increases and ultimately may exceed the capacity of the switch/router to manage the volume of state information required for effective multicast routing.